Of Headphones to Come Page 2

The problem is that virtual audio via headphones remains very difficult to achieve. The things that make today different from the past are the facts that researchers now understand the challenges fairly well, and that current DSP technologies have the computing power to do the job. The remaining task is to dig into all the bits and pieces of the problem, efficiently come up with practical solutions, then figure out how to get it all to work together. That last bit may be the hardest, because there are a lot of variables in this equation:

• Individual HRTFs must be generated for each listener. The resulting file must be broadly compatible with equipment made by many manufacturers.

• Headphones will have to be very well behaved in the time domain. Localization cues generated by DSP must be reproduced with no added garbage. Impulse responses must be very clean, or we won't be able to clearly hear the cues. (Actual frequency response is less problematic, as tonal-correction curves can be built into the headphones.)

• Much of the intended use of these types of headphones is with smartphones and tablets. Decisions must be made about where the various computations are made. The virtual audio reality may be rendered by the portable device, but it's likely that all the changing HRTF cues occasioned by head movements will be rendered in the headphones themselves, which will have significant computing power.

• New types of content and formats will have to be standardized so that they can be interpreted by hardware from a variety of manufacturers.

• Perhaps most difficult of all, headphones will have to be transparent to sound in the real environment around the listener. One of the major goals here is a mixed reality, in which you can interact with your normal environment as usual, while hearing artificial sounds superimposed on the real sounds. Imagine, if you will, the kids hearing a Pokémon giggling in the bushes as they search for it.

So, along with being a difficult technical problem, there is also a significant convergence problem. Numerous industry standards must be developed before creators can produce content with complex formats that can be transported to and rendered for individual consumers using a variety of devices.

I knew that three-dimensional sound through headphones is tempting, but for a long time I thought the problem was just too complex, that it wouldn't happen any time soon. I hadn't yet put together in my head all the pieces I've described, and I doubted manufacturers would have the will to overcome the difficulty. After attending the AES conference, and digesting and reporting on many of the papers presented, I suddenly found myself believing that it would happen. There's just too much at stake—too many cool things to come from this technology, and so much money to be made by those who figure it out. As I sat in a room with 100 highly paid researchers, each on a mission to develop ways in which people will hear sound through headphones in the future, I could feel the industry's intense will to get this job done. Here's a description of the Headphone Conference, from AES's webpage:

"More than 300 million pairs of headphones were sold in 2015, and people are using headphones everywhere. The popularity of 'smart and wearable' devices has driven developments in low-power processors and sensors that are enabling the augmentation of headphones with features more typically associated with hearing aids or smartphones. Therefore, this conference will focus on technologies for headphones with a special emphasis on the emerging fields of Mobile Spatial Audio, Personal Assistive Listening, and Augmented Reality. This conference will assemble scientists, developers, and practitioners who are involved in any head-worn hearing technology, be it in theory, technical design, application or evaluation. The conference will enable an interdisciplinary dialogue across the headphone and hearing aid industries."

Gaming and Pokémon stuff will certainly be profitable, but the real money is in developing something as ubiquitous as the smartphone. I draw your attention to the last sentence of the above description: "The conference will enable an interdisciplinary dialogue across the headphone and hearing aid industries."

One of the ideas discussed at the conference was that of personal assisted listening, in which the sounds around you are modified in some way to improve your sense of hearing. Here is the crossover with the hearing-aid industry: There are many cases in which those who enjoy normal hearing might find it nice to hear even more clearly. One technique described was the canceling of incoherent, diffuse noise and the augmentation of coherent sources—sound sources that are spatially well defined. Imagine sitting in a loud, crowded restaurant, talking with your friends: their voices will be nearby and spatially coherent; the din of the crowd will be diffuse. It's possible to suppress the background noise and augment the coherent sound of the friends sitting at your table, to allow you to clearly hear them even in such noisy environments.

Or imagine firefighters who could don special headsets that suppress the roar of the flames but augment the sounds of human voices, allowing them to more easily find survivors. Rescue workers might be able to use such "bionic" hearing to help them locate the muffled voices of people buried in the rubble of collapsed buildings. And, of course, there are military applications.

Taking it a step further: Those traveling in foreign countries could use smart headphones to hear English translations on the fly of what people are saying. Or a step beyond that—cameras and autonomous-driver automotive technologies could be combined in a headset that would allow the blind to follow a trail of sonic breadcrumbs as the headset listens and watches for obstacles and traffic lights.

In short, we'll stop thinking of headphones as a way to make phone calls and listen to movies and music, and start thinking of them much more as we do smartphones—as personal assistants, fitness-training aids, and reality enhancers.

Some of these devices will be full-size headsets with dropdown visual displays similar to those displayed by Microsoft's HoloLens, which superimposes virtual visual objects on the real environment. Such devices will permit the mixing of aural and visual realities. But many such devices designed for everyday use will be more discreet, and mounted in the ear—mashups between hearing aids and in-ear monitors. They'll also likely have interchangeable cover plates of different colors and designs, to suit the user's and the day's fashion requirements.

What does all this have to do with traditional two-channel audiophiles? Not all that much. I'm talking about something that will happen five or ten years from now, and two-channel recordings aren't going to magically go away. But there are a few things that may affect the audio avocation we so love.

Just as, 30 years ago, the best-sounding headphones came out of the pro-audio market, it's likely there will be an early drive for professional virtual-audio systems needed in content creation. A couple of generations on, these systems might sound very good. We all know that room acoustics play an important role in the sound of a good stereo system; future headphone systems will be able to synthesize any number of room acoustics. While they may not replace a big, serious hi-fi rig, they may make a high-quality listening experience portable and at considerably lower cost, thus making high-fidelity sound available to more people, more of the time.

One of the profitable areas left to music producers today is live concerts. The same virtual audio/video headset hardware used for future gaming could also be used as a way to distribute pay-per-view "you are there" concert experiences. But the original content for these concerts (and music videos, computer games, and movies) will use an object-oriented encoding system similar to Dolby Atmos. In other words, sound won't be assigned to a number of audio channels to be reproduced by a matching array of loudspeakers, but rather as various movable aural objects emitting sound from positions in space. Additionally, such technology will synthesize the acoustic response of the room or space in which the sounds are made. For audiophiles, this means that there will be ever-increasing pressure for content created with or recorded in new spatial-audio formats, which would then have to be downmixed for replay through two-channel or surround systems.

My inner audiophile cringes.

On the other hand, I could easily see world-class symphony orchestras making recordings using special soundfield microphones—recordings that would produce a very convincing immersive listening experience through a professional-quality virtual headphone system. I would argue that, given enough time for complete development, these systems might deliver a listening experience superior to two-channel or surround sound, as it can produce the illusion of a sound coming from any direction, seemingly enveloping you with sound. While high-end virtual audio systems may never sound as refined as the best two-channel systems, they may offer a heightened sense of immersion in a soundfield. About that, my inner headphone geek gets enthused.


And as long as I'm making predictions: There's currently only one company that controls a significant swath of the personal-audio market, from content sales through delivery to the hardware it's played through and the software to control it: Apple Inc. and its subsidiary Beats Electronics. Given the Apple ecosystem of content, software, and hardware, and Beats' extraordinary dominance in headphone sales, Apple is poised to develop and deliver a mixed-reality experience without the need for any industry standards other than those used in content creation. I suspect that this gives Apple a built-in lead of three years on everyone else. I don't expect that, in the long run, they'll end up being the best at it; I do expect that they'll do the job well enough for the average consumer, and that they'll do it first. It'll be the iPod/Beats by Dre phenomenon all over again.


Jason Victor Serinus's picture

Thank you.

mvs4000's picture

No mention of binaural??

arve's picture

The traditional "binaural" is merely a party trick that happens to work for a few people. What Tyll is talking about here is "Binaural 2.0", where instead of the binaural cues being pre-embedded in the recording, and completely static, the cues are computed on-the-fly, adapted to the shape and size of your head and ears, and to your hearing, plus it's adapted to the motion of your head and body, so the location stays fixed.

In other words: It's merely a much more convincing binaural than what you've seen before, and one that can be synthesized for other media than headphones; using things like beamforming speaker arrays, you will be able to project sound to points in a 3d space in your room.

tonykaz's picture

Headphone designs today are similar to Auto Designs of the early 1900s -- just starting out.

We Audiophiles are the early adopters, enjoying the incredible versatility and cost advantages of personal/portable High-End Audio devices.

This "fresh" group of Engineers will launch us into a new "Domain" of accessibility.

It's an exciting time to be watching this un-fold, it's like the Future is "Now"! Phew.

Thanks for taking the time for this effort, your group of lads are "Five" Stars!!!

Of course we'll need Bob Katz to keep producing "A" level recordings and we'll need to discourage the horrible stuff aimed at Radio Station Play lists. I'm kinda thrilled to consider the idea that we'll be using VR to enjoy the Detroit Symphony from any Beach we happen to be sitting on.

This is Exciting Stuff!

Tony in Michigan

DougM's picture

I believe millions of Sennheiser 414s and 424s, AKG K140s and K240s, Koss Pro 4AAAAs and other quality phones were bought by music lovers before the advent of the Sony Walkman!

dalethorn's picture

Gordon Holt of Stereophile recommended several headphones well before the Koss's and Sennheisers. Given that stereo has been with us for 60-plus years now, and given that 4 and 5 channel have been around for a long time too - and yet stereo is still dominant in hi-fi (especially headphones), I don't expect much to change. Users will adopt more DSP's for this or that, but the ultimate realism (binaural) is already several decades old, so there's hardly a necessity to improve on that. Recording for speakers, and then developing "realism generators" to undo that to sound better on headphones - an exercise in futility. Better to include a sub-track on the media with the binaural mix, etc.

arve's picture

Since you're describing it as "several decades old": Binaural as you know it - embedded into a recording - has issues. Severe issues

The first one being that they're static - when we get positional cues in real life, we do not only by processing the ITD and ILD, but we do moving our head about to figure out how it changes when we do - a pure binaural track simply cannot do this.

The bigger elephant in the room is that they're recorded using a normative HRTF, and the illusion falls completely apart if your ears (including the ear canal) are the wrong shape or size compared to what is assumed in the dummy head recording. It will get even worse if your head is the wrong size.

The embedded cues in the recording become virtually worthless when these criteria doesn't match, and many people don't experience spatial cues at all, or experience them in the wrong place.

I belong to the category that don't get much out of binaural recordings folded down to a stereo track: Sounds that should come from in front of me tend to instead come from behind, and height cues are weaker than they should be.

What Tyll is talking about in this piece is how "binaural" can be improved by instead making object-oriented recordings - where sound sources occupy a point in space, and the binaural cues (and for that matter room characteristics) are synthesized especially for your

dalethorn's picture

Disagree. I've been using headphones longer, and have a large collection of binaural and conventional recordings. HRTF is mostly a myth, as different manufacturers, most of them AES members, have their own opinions. The recorded quality is the big issue. I obtained a copy of an album by a "Les Baxter" recently from a Stereophile reader's recommendation - easy listening music, and some of the music there is startlingly realistic, despite the age. Anyone who's done 170 headphone reviews as I have knows the larger issues, the biggest of which is frequency response, and today's products are all over the map. You still read people saying "we all hear differently", which of course is irrelevant to accurate reproduction.

The bottom line is quite simple - manufacturers need to manufacture, reviewers need to review, and us poor listeners are stuck with the junk that comes along, unless we're lucky enough to find a gem amongst the junk. Nothing important will change hopefully, as a very long history has shown. The best stereo recordings of the late 1950's still sound great, on headphones no less. Take a listen around, at the "loudness wars" for example. Those people aren't going to hand you recordings that represent the best in musical realism. People are frequently promised Utopia by everyone from musicians and manufacturers to politicians. Ain't gonna happen.

Russell Dawkins's picture

You said a mouthful, dalethorn, touching on a notion that escapes most people, namely:
" You still read people saying "we all hear differently", which of course is irrelevant to accurate reproduction."
It seems difficult for some to process the fact that however compromised a person's hearing is, that is how they experience reality and if the reproduction system is capable of sounding realistic, it will sound realistic to anyone, almost without regard to specific hearing impairment.

stalepie's picture

"On the other hand, I could easily see world-class symphony orchestras making recordings using special soundfield microphones—recordings that would produce a very convincing immersive listening experience through a professional-quality virtual headphone system."

Reminds me of this...