Audio Engineering: the Next 40 Years Page 2

Let's conservatively assume that the resolution and accuracy of gesture technology will double every two years (though given the economic incentives, doubling each year may be more realistic). Common gestural devices ($100 at 150 pixels per inch [PPI], by today's standards) will boast two orders of magnitude greater resolution by about 2025. Costing only $1 for the ability to map and track 15,000 3D positions per inch, such devices will allow for much greater degrees of freedom and movement (think Minority Report without the $1 million price tag). By 2025–2030, the price of sophisticated, high-resolution, free-air gestural control will have fallen to commodity levels and the devices will be mass-produced.

Will gestural control replace touchscreens and mice by 2025? No. But the transition will be well under way. Clearly, the next 40 years of human/computer interaction will be free-space and gestural.

Spherical Audio
Let's move on to 3D virtualization. We need to think systemically, with video, audio, and head-motion tracking all working together seamlessly. We'll start with virtualized audio.

Both gaming and film are quickly moving into providing a sense of total audio immersion. In real acoustic spaces such as movie theaters, we're seeing the delivery of spherical audio from emerging technologies like Dolby Atmos, DTS Neo, and Barco Auro. However, these immersive real-space technologies require more speakers and amplifiers, more expense, and a great deal more work to maintain—all things that consumers embrace slowly, if at all. The average consumer has balked at six speakers. Requiring 10, 14, or 22 speakers, and the amps to drive them, is a nonstarter.

Market realities suggest that the primary thrust of 3D audio innovation will occur through headphones. Already, first-generation 3D headphone products such as DTS Headphone:X are breaking ground. Over the next decades, popular gaming and entertainment media will lead the relentless push toward fully immersive audio realism, predominantly over headphones.

Legitimate, full-coverage headphones (not only earbuds) have exploded into the mass consciousness in just the last few years, and the trend will only accelerate. Popular culture is becoming increasingly conditioned to accept "cans" as a primary method of consuming audio.

Jimmy Iovine and Andre Young, aka Dr. Dre, the creators of Beats by Dr. Dre headphones, have arguably done more than anyone else to position headphones as a generational, cultural, and global style statement. Beats now sells well over $1 billion of consumer audio products every year, and has captured more than 60% of the market in headworn audio products costing more than $100. And Beats isn't just following technology trends—it aims to lead, having recently contributed $70 million to the University of Southern California for the brand-new USC Jimmy Iovine and Andre Young Academy for Arts, Technology, and the Business of Innovation.

There are now entire stores devoted to headworn technology. I recently spotted one such store, the Headphone Hub, at Houston's Bush International Airport (see photo). This is not a fad: over the next 20-30 years, 3D soundfield production and design will be one of the biggest growth areas in audio delivery via headphones. Microphone designers, headphone makers, audio software engineers, and postproduction engineers will move from today's paradigm of x-dot-x channels (5.1, 7.2, etc.) to a seamlessly spherical, object-oriented soundfield.


Headphone Hub, Bush International Airport, Houston, in September 2013. Photo: John La Grou

If we plot a chart of 3D audio growth with a projection of it doubling every two years (fig.5), today's $1000 3D audio solution will be commodity priced by 2025, combined with a hundredfold improvement in spatial and timbral resolution experience over headphones.


Fig.5 Immersive 3D audio growth.

Conservatively, by 2025–2030 we should expect that highly realistic immersive audio will be part of every low-cost portable device, gaming console, and home entertainment system. And by about 2040, on-ear audio will rival or exceed the subjective performance of today's best audiophile rooms and loudspeakers. Moreover, in a very short time, perhaps as soon as 2020, common commercial music will be routinely mixed in full 3D immersion and delivered in an open-source format, most likely a derivative of Dolby Atmos or DTS Neo.

Virtualized Visuals
Virtualized imagery plays a central role in the future of audio production. The future of headworn visual displays is clear: higher resolution, finer dot pitch, better dynamic range, lower latency, and, of course, relentless evolution toward three-axis immersion as our standard image format.

By now, many of us have seen photos of and read articles about the prototype of the Google Glass, a headworn computer with a head-mounted display (see photo). Sources claim that the Glass will be available in 2014 for a street price of around $400. This is a paradigm shift. If there were only one takeaway from this brief look into the future, it should be this: We are moving from a culture of handheld devices to one of headworn devices.

Sergey Brin, co-founder of Google, wearing Google Glass. Photo: Reuters/Carlo Allegri

It won't be long before smart mobile computers are designed into small, lightweight, headworn devices not unlike the Google Glass, but increasingly more powerful and ubiquitous. Vendors such as Apple, Intel, Microsoft, Oakley, Olympus, Samsung, and Sony, along with at least a dozen startups, are all reportedly developing headworn smart-mobile devices.

While Google and others are defining the mainstream of headworn gear, I think there's another kind of device that's more directly applicable to the future of audio and media production: gaming displays. Of all the gaming displays now in development, I think one of the most important is the Oculus Rift (see photo).


Sergey Orlovskiy using the developer kit version of the Oculus Rift (with separate headphones).

The Rift has one discrete video display per eye, for true 3D (the resolution in development is true 1080p), and unrestricted head-motion tracking: If you turn your head, the audio and visual elements of the scene move with you in lifelike, immersive realism.

Observing gamers using the Oculus Rift, I feel that we're seeing the future of display technology. To see what I mean, watch the YouTube video "The Best and Funniest Oculus Rift Reactions." What you'll see is the most deeply convincing, fully immersive virtual-reality experience to date. The experience of Rift reality can be uncanny enough to be disturbing. Oculus plans to have shipped their first commercially available product by the time you read this.

To return to my analysis of trends: The comprehensive cost-performance efficiency of video displays since 1980 shows a doubling roughly every 18 months (footnote 2; a doubling of efficiency every year from now on would not be surprising). Thus, by 2025, the CPE of immersive displays will be at least 100 times better—at a commodity-priced entry point. (fig.6). By 2035, immersive visuals will be at least 10,000 times more powerful than today; and by 2050, we can reasonably project that commodity-grade, headworn virtuality will be nearly indistinguishable from what we see with our own eyes in real space. We also know that head displays will be much smaller and lighter, and perhaps use a technique called direct projection, in which images are projected (scanned) directly onto the human retina one pixel at a time.


Fig.6 Video display cost-performance trend.

Head-Motion Tracking
Immersive sound and picture would be impossible if did not "track" with natural movements of the head. When you turn your head, the virtual sound and picture must react as they would in real sensory time and space. Effective head-tracking requires near-zero latency response, with high spatial resolution in all axes of head movement. However, head-motion tracking is a relatively young technology using various sensing methods: IR-optical, e-field, RF, and so forth.

Footnote 2: Display technology cost-performance weighted sum of Resolution, Color Depth, Dynamic Range, Latency, Dot Pitch, Refresh, Contrast Ratio, Viewing Angle, Brightness, and Energy Use . . . vs cost.

IgAK's picture

The "progress" is no surprise, but the thought of the constant stream of expenditures to replace what will no longer work, or no longer be "supported", or just to keep up with everyone else really isn't all that attractive! There is such a rush to get everything to market before it obsoletes or loses sales appeal nowadays that nothing works properly and everything is patched on top of patches and then all too soon has to be replaced entirely - long before "they" even get it right in the first place. I just hope those gimmicky free-air gestural controls understand emphatic one-fingered gestures when the control interfaces work as poorly as the mechanical ones do now to let the designers know what we think of the half-baked breakware and horrendously poorly thought-out controls they keep sticking us with.

I'm just glad I stuck with the still-best audio technology around for stability - my vinyl gear - while a constant revolving-door stream of that wonderful new digital stuff keeps coming and going through my listening room as it obsoletes every few months. Technology has become so terrific that the freshest breath of air is ironically what does not have to be replaced so darn often! That's what has become a really a novel pleasure anymore. Call me old if you like, but I yearn for the stability of those days when things didn't change so much. Dynamics are wonderful, but at near 120 dB, we will have as much as we can possibly go deaf if that's sustained. Broke, too.

dalethorn's picture

I like the idea of tech that removes redundant and unnecessary physical motions and other tedium. But I would prefer my music played as it is live. Somehow the idea of "immersive" sound seems like a gimmick - a distraction - unless the recording were engineered for that sound experience right from the start. Undoubtedly there will be wonderful recordings done for immersive listening, but for audiophiles who are still in stereo in 2014, the interesting thing will be partly how much the availability of great stereo recordings will diminish with time, partly how many tinkered and remixed versions of older recordings will appear that push the prior versions aside on the virtual shelves, and how many high-quality remasterings of great classic recordings won't get done due to prioritizing the 'new' sound. The second thing I mentioned is very important for persons who are exploring things that are new to them, and will have to wade through ever greater lists of junk to get to the better recordings. Stereophile may serve a greater purpose in the future, as an invaluable filter to help get people to those better recordings.

Alan Tomlinson's picture

Changing technology is a mixed blessing. When digital audio came out, it sounded truly repulsive(see Bop Til You Drop). Now there are genuinely excellent digital recordings and reproduction systems. One of the primary reasons that digital recording improved(as opposed to digital reproduction), is that a few excellent, highly-skilled recording engineers(e.g. George Massenburg) said 'this sounds like crap, how can I make it better?'. The reason that these engineers were able to change digital recording, is because they new how great recordings sounded and they new how great recordings sounded because they had worked in environments where it was possible to record bands together and therefore they were able to hear how those bands sounded in a favorable acoustic environment.

Many of these crucial aspects of fine engineering are gone. There are few good sounding recording rooms left which has led to the audio industry producing much less top-end equipment. More importantly though, there are few(er) people working in recording who actually know what a band sounds like in a good room and crucially, how to capture the sound of a band in a good room. Music lives through the connection of musicians playing together in the same place at the same time. That's not to say that it's impossible to do it without that, just that it's harder.

I have found very few recordings in the last 20 years that sounded really good that were made by anyone who hadn't come up in recording studios when there were a good many of them still around. Where will the quality recording engineers of the future come from when they have no idea how to make things sound good?



Alan Tomlinson

Rick Tomaszewicz's picture

...can't wait for ultra realistic, all immersive future performances that are BORING.  Too bad new music isn't evolving at the same rate as its recording and playback technologies. 

Maybe we expect less from new music than than we do from its delivery mode.  Do we really want to hear Bach, Miles, Frank and the Beatles etc. through yet another new and improved playback system?  And, each of these technical improvments often require the music to be repurchased in a new format at a higher price.  

These changes are driven by profit, technical and artistic imperatives/opportunities.  The last, but most important of these, is diminishing in importance.

In the distant past, only the wealthy and sophisticated heard the best music because it had to be heard live.  Artists strove to satisfy and impress the highest common denominator.  Once music could be recorded, more people had access to it.  And, once music became portable, everyone could hear it and reset market expectations.  Most artists try to deliver what will sell.  (They gotta eat and pay rent.)  

There's been plenty written about what sells today; they call it aural wallpaper.  ITM, I'll keep thrift store hunting for $1 vinyl and $2 CDs and supporting new artists who aim up.  

Rick Tomaszewicz's picture

Just got back from the Salvation Army Thrift Shop.  Big sale on CD's; three for a buck. Seventy classical CD's for $23!  Sorry, high-resolution-download-seeking crowd, that would have cost you $1,260.  

Stop worrying about having the latest technology.  Just enjoy the music.


corrective_unconscious's picture

I think those dynamic range figures are largely theoretical, even in the cases where qualified by descriptors such as "pristine conditions." I also think those dynamic range figures are not including the limits imposed by ambient sound in the listening environment - I think they pertain (supposedly) only to the recording and playback chain itself.

And the ultimate, real world dynamic range capability of a given medium is far removed from how much of that capability actually gets used by a particular recording.

As others have mentioned, no one could think those early digitally recorded CDs sounded better than most of the even fairly well recorded and produced LP records. No one could think where the market has gone, in bulk, compressed, lossy digital files, offers better sound than Redbook CDs.

Music_Guy's picture

Reviewers talk about systems and content so good that they take the listener into the concert hall or into the studio...

Funny, but for me, the experience I prefer is one of binging the performance into my space.  When I sit down to listen to pre-recorded music,  I want it to clear, dynamic and musical. It is just fine for me that the sound blends with my own acoustic space.  While I look for improvements in clarity and dynamics in my system, I do not yearn for more spatial clues. It is just fine for me that the sound blends with my own acoustic space.  It is uncanny for me how well a good two channel setup does this.

I know the gamers and the videophiles will be pushing the industry towards total immersion.  Digital signal processing is amazing!  (price/performance)  I sure wish there will be similar advances in improving clarity of two-channel reproduction as well in both source material and reprduction equipment.

(And, as others have said, I wish that there was improvement in the dynamic flow encoded in the source material without such a big price or analog..)


wozwoz's picture

Moore's Law, doubling of computing power etc is all very nice ... and yet, in the last 15 years, the quality of music as consumed by most people has gone dramatically backwards --- not forwards. For most, it has got worse --- not better. Oh sure ... for the audiophile elite, we have hi-rez SACD and DSD ... but the simple fact is most consumers are now streaming at 100th of the bit-rate of the studio masters over their iPhones, and if they even have standalone speakers, they are often  'effectively' mono docking stations, or USB sticks attached to a computer. 

Perhaps for the first time in audio history, technology is causing sound quality to decline, not improve. Sadly, I don't think things will change much in 50 years. 

ppgr's picture

Who needs dynamic range when commercial music rarely exceeds 12 dBs and getting more and more compressed?