Audio Engineering: the Next 40 Years Page 3

Nevertheless, gestural-control and head-tracking technologies share many of the same design attributes, and appear to be maturing at similar rates. Head-tracking requires six degrees of freedom (6DOF), which tracks XYZ axes and rotations about each axis, known as pitch, yaw, and roll. Popular head-motion tracking systems for gaming today cost around $200 and offer about 640x480 of raw resolution, a sample rate of 100 frames per second, and latencies of less than 10 milliseconds. Lab-grade units with better resolution and response are also available.

Assuming a doubling of CPE every two years, high-resolution head-motion tracking should reach commodity status by 2025, offering a larger field of use and nearly imperceptible latency. By 2025, IC manufacturers will offer low-cost second- or third-generation silicon tracking solutions that will be used in most headworn devices. And by 2035, ultra-high-resolution, low-cost head-motion tracking will be part of every virtual-reality device.

Over the next two decades, therefore, all of these virtual technologies—and others I haven't mentioned, such as haptics (tactile feedback technology)—will converge into a singular media ecosystem. As this happens, the way we produce and consume A/V media will radically change.

By 2025, the transition from mouse to gesture will be well under way, and by 2035 both the mouse and touchscreen paradigms will be fading. By 2025, full-immersion headworn video displays will be rapidly displacing external monitors, especially for gaming. And by 2040, most individual visual-interface applications (home entertainment, business, mobile, etc.) will be headworn.

Today's media-production studios are increasingly migrating from hardware to software applications, yet most professional studios remain anchored to rooms full of hardware. This will change. As A/V headgear becomes more convincing, with immersive performance that more faithfully mimics the physical world, almost all media postproduction will migrate to the virtual domain. The few exceptions will be body-sensed acoustics (haptic feedback or subwoofers, for example).

When we converge all of our technology projections into a single media ecosystem, we recognize that high-resolution visual editing, audio mixing and mastering, game development, music composition, and other A/V production and postproduction tasks will be performed predominantly in the virtual domain by 2040, if not earlier.

Likewise, by 2040—and perhaps as early as 2025—most A/V and gaming will be delivered with stark realism via low-cost headworn devices. And when we combine artificial intelligence with full-immersion virtual reality, the line between production and consumption will blur. With assistance from deep AI running on massively powerful CPU–GPU processors, media consumers will become media creators, participating with others in new forms of self-organizing virtual stories.

The era of fully virtual A/V postproduction is almost here, and in some ways has already begun. Post rooms with giant mixing consoles, racks of outboard hardware and patch panels, video editing suites, external video and audio monitors, touchscreens and physical input devices, and large acoustic architectures will become historical curiosities.

Every functional piece of "production equipment"—every knob, fader, switch, screen, indicator, meter, and patch point—will be visible and gesture-controllable entirely in immersive space. Audio and visual monitoring will migrate from big rooms of external hardware to increasingly lightweight and human-adapted headworn devices. The keyboard and mouse will be replaced by spoken commands and gestures made in free space. And if you really need a keyboard, it will be provided, complete with tactile (haptic) feedback—in virtual space.

Today, a $400 Sony PlayStation 4 employs some 5 billion transistors with 2-teraflop graphics processing—or, according to Ray Kurzweil, about the same computing power as one mouse brain. By 2025, a commodity gaming console will be nearly 10,000 times more powerful than today's machines (fig.7). That's the processing power of a human brain sitting on your desktop—roughly equivalent to the power of IBM's most powerful supercomputer in 2008 (footnote 3).


Fig.7 Media creation computing power trend.

With effectively unlimited processing power and profoundly advanced AI, our future production tools will allow us to call up a complete symphony orchestra in any concert hall of our choosing. Let's add a 200-voice choir, or maybe a great soprano or piano soloist out front. Systems for creating immersive media will allow us to input our own music and interact with each desk of a symphony orchestra—or a gamelan orchestra, or whatever—of any size, and in any space, assuming our desired instruments and acoustic space have been characterized. Gestural and voice commands will make refinements to the score and performance, just as a conductor would rehearse an orchestra in real space, until the ensemble plays exactly as we desire.

On the delivery side, room speakers and monitor screens will not go away. Casual and background A/V environments (cars, businesses, homes) will continue to drive a real-space market. But for the audiophile and videophile worlds, the demographic and technical trend data suggest that within 15 to 20 years we will be well into a transition away from big amplifiers, big speakers, big screens, and big rooms to put them in, and toward an ultra-high-resolution headworn experience that will exceed today's best real-space performance.

Well-recorded music will no longer be subject to wildly variable room acoustics. A recording's spatial and timbral realism will remain far more consistent for all listeners. Unless our technical trends abruptly stop, real-space audiophile and videophile markets will likely begin to decline, perhaps as soon as 2025–30, as new generations experience the superior sense of immersive 3D realism offered by accelerated improvements in headworn technology.

Technology curves show that electromechanical devices have been halving in size every 30 months for the last 50 years. This suggests that headworn A/V-immersion hardware will continue to shrink in size as its powers of resolution increase. It's not much of a stretch to envision ultra-lightweight "transparent headgear" that keep your eyes and ears open to your real-space environment, while providing immersive qualities on demand: for instance, physically unnoticeable headphones that allow you to hear in-room sounds just as you would with uncovered ears.

A century ago, Oscar Wilde noted that "life imitates art." Today, technology imitates science fiction. In the not-too-distant future we'll wear holodecks on our heads The future of music, audio, filmmaking, gaming—any creative media construction, from inception to postproduction to delivery—is boundless, limited only by our imaginations (footnote 4).

About the author: John La Grou is founder and chair of POW-R, the world's leading audio bit-length reduction algorithms. Roughly one-third of all CD and downloaded music is processed with POW-R. He is also founder and CEO of Millennia Media, a design leader in critical audio recording, live sound, postproduction, mastering, and archiving. Millennia is the world's most popular front-end for film scoring and classical music recording, while Millennia's phono preamplifiers are used by the Library of Congress to archive their collection of three million historic audio recordings. John presented an earlier version of this article as the Sunday lunchtime keynote address at the 135th Audio Engineering Society Convention in New York, October 2013.

Footnote 3: Projections of processing speeds since 1990 show that it takes about 17 years for supercomputer power to migrate to commodity desktop and mobile devices.

Footnote 4: Special thanks to Ray Kurzweil.


IgAK's picture

The "progress" is no surprise, but the thought of the constant stream of expenditures to replace what will no longer work, or no longer be "supported", or just to keep up with everyone else really isn't all that attractive! There is such a rush to get everything to market before it obsoletes or loses sales appeal nowadays that nothing works properly and everything is patched on top of patches and then all too soon has to be replaced entirely - long before "they" even get it right in the first place. I just hope those gimmicky free-air gestural controls understand emphatic one-fingered gestures when the control interfaces work as poorly as the mechanical ones do now to let the designers know what we think of the half-baked breakware and horrendously poorly thought-out controls they keep sticking us with.

I'm just glad I stuck with the still-best audio technology around for stability - my vinyl gear - while a constant revolving-door stream of that wonderful new digital stuff keeps coming and going through my listening room as it obsoletes every few months. Technology has become so terrific that the freshest breath of air is ironically what does not have to be replaced so darn often! That's what has become a really a novel pleasure anymore. Call me old if you like, but I yearn for the stability of those days when things didn't change so much. Dynamics are wonderful, but at near 120 dB, we will have as much as we can possibly go deaf if that's sustained. Broke, too.

dalethorn's picture

I like the idea of tech that removes redundant and unnecessary physical motions and other tedium. But I would prefer my music played as it is live. Somehow the idea of "immersive" sound seems like a gimmick - a distraction - unless the recording were engineered for that sound experience right from the start. Undoubtedly there will be wonderful recordings done for immersive listening, but for audiophiles who are still in stereo in 2014, the interesting thing will be partly how much the availability of great stereo recordings will diminish with time, partly how many tinkered and remixed versions of older recordings will appear that push the prior versions aside on the virtual shelves, and how many high-quality remasterings of great classic recordings won't get done due to prioritizing the 'new' sound. The second thing I mentioned is very important for persons who are exploring things that are new to them, and will have to wade through ever greater lists of junk to get to the better recordings. Stereophile may serve a greater purpose in the future, as an invaluable filter to help get people to those better recordings.

Alan Tomlinson's picture

Changing technology is a mixed blessing. When digital audio came out, it sounded truly repulsive(see Bop Til You Drop). Now there are genuinely excellent digital recordings and reproduction systems. One of the primary reasons that digital recording improved(as opposed to digital reproduction), is that a few excellent, highly-skilled recording engineers(e.g. George Massenburg) said 'this sounds like crap, how can I make it better?'. The reason that these engineers were able to change digital recording, is because they new how great recordings sounded and they new how great recordings sounded because they had worked in environments where it was possible to record bands together and therefore they were able to hear how those bands sounded in a favorable acoustic environment.

Many of these crucial aspects of fine engineering are gone. There are few good sounding recording rooms left which has led to the audio industry producing much less top-end equipment. More importantly though, there are few(er) people working in recording who actually know what a band sounds like in a good room and crucially, how to capture the sound of a band in a good room. Music lives through the connection of musicians playing together in the same place at the same time. That's not to say that it's impossible to do it without that, just that it's harder.

I have found very few recordings in the last 20 years that sounded really good that were made by anyone who hadn't come up in recording studios when there were a good many of them still around. Where will the quality recording engineers of the future come from when they have no idea how to make things sound good?



Alan Tomlinson

Rick Tomaszewicz's picture

...can't wait for ultra realistic, all immersive future performances that are BORING.  Too bad new music isn't evolving at the same rate as its recording and playback technologies. 

Maybe we expect less from new music than than we do from its delivery mode.  Do we really want to hear Bach, Miles, Frank and the Beatles etc. through yet another new and improved playback system?  And, each of these technical improvments often require the music to be repurchased in a new format at a higher price.  

These changes are driven by profit, technical and artistic imperatives/opportunities.  The last, but most important of these, is diminishing in importance.

In the distant past, only the wealthy and sophisticated heard the best music because it had to be heard live.  Artists strove to satisfy and impress the highest common denominator.  Once music could be recorded, more people had access to it.  And, once music became portable, everyone could hear it and reset market expectations.  Most artists try to deliver what will sell.  (They gotta eat and pay rent.)  

There's been plenty written about what sells today; they call it aural wallpaper.  ITM, I'll keep thrift store hunting for $1 vinyl and $2 CDs and supporting new artists who aim up.  

Rick Tomaszewicz's picture

Just got back from the Salvation Army Thrift Shop.  Big sale on CD's; three for a buck. Seventy classical CD's for $23!  Sorry, high-resolution-download-seeking crowd, that would have cost you $1,260.  

Stop worrying about having the latest technology.  Just enjoy the music.


corrective_unconscious's picture

I think those dynamic range figures are largely theoretical, even in the cases where qualified by descriptors such as "pristine conditions." I also think those dynamic range figures are not including the limits imposed by ambient sound in the listening environment - I think they pertain (supposedly) only to the recording and playback chain itself.

And the ultimate, real world dynamic range capability of a given medium is far removed from how much of that capability actually gets used by a particular recording.

As others have mentioned, no one could think those early digitally recorded CDs sounded better than most of the even fairly well recorded and produced LP records. No one could think where the market has gone, in bulk, compressed, lossy digital files, offers better sound than Redbook CDs.

Music_Guy's picture

Reviewers talk about systems and content so good that they take the listener into the concert hall or into the studio...

Funny, but for me, the experience I prefer is one of binging the performance into my space.  When I sit down to listen to pre-recorded music,  I want it to clear, dynamic and musical. It is just fine for me that the sound blends with my own acoustic space.  While I look for improvements in clarity and dynamics in my system, I do not yearn for more spatial clues. It is just fine for me that the sound blends with my own acoustic space.  It is uncanny for me how well a good two channel setup does this.

I know the gamers and the videophiles will be pushing the industry towards total immersion.  Digital signal processing is amazing!  (price/performance)  I sure wish there will be similar advances in improving clarity of two-channel reproduction as well in both source material and reprduction equipment.

(And, as others have said, I wish that there was improvement in the dynamic flow encoded in the source material without such a big price or analog..)


wozwoz's picture

Moore's Law, doubling of computing power etc is all very nice ... and yet, in the last 15 years, the quality of music as consumed by most people has gone dramatically backwards --- not forwards. For most, it has got worse --- not better. Oh sure ... for the audiophile elite, we have hi-rez SACD and DSD ... but the simple fact is most consumers are now streaming at 100th of the bit-rate of the studio masters over their iPhones, and if they even have standalone speakers, they are often  'effectively' mono docking stations, or USB sticks attached to a computer. 

Perhaps for the first time in audio history, technology is causing sound quality to decline, not improve. Sadly, I don't think things will change much in 50 years. 

ppgr's picture

Who needs dynamic range when commercial music rarely exceeds 12 dBs and getting more and more compressed?