The Stereo Image

The author demonstrating stereo microphone techniques at an English audio show in 1981.

For most people the terms hi-fi and stereo are synonymous, and yet it is clear that there is still a great deal of confusion over what the word "stereo" actually means. There isn't even a consensus of opinion amongst producers of records, designers of hi-fi equipment, audio critics and music lovers as to the purpose of stereo, and considering that the arguments show no sign of diminishing in intensity, it is instructive to realise that 1981 sees both the 100th anniversary of Clement Ader's first stereo experiments and the 50th anniversary of Alan Blumlein's classic patent on stereo.

Ader placed telephone microphones in two groups, left and right, on the stage of the Paris Opera and subscribers listened on headphones to the twin signal which was transmitted over telephone lines. Blumlein's work was still experimental, but more theoretical in that it examined exactly what directional information needs to be preserved on a two-channel system in order that an accurate aural picture can be recreated using two loudspeakers.

Sound-Source Location
Before wading deeper into the morass of conflicting opinion, it is worth a look at how a human being perceives the direction of real-life sound-sources. Although the eyes undoubtedly play a major role in determining such directions, the ears provide an essential and evolutionary desirable backup. Any caveman out of his cave on a dark night, and incapable of hearing where that quiet lip-smacking noise (curiously like that made by a hungry sabre-toothed tiger) was coming from, wouldn't stand much chance of passing his genes on to future generations. And so the sabre-toothed tiger, having encouraged the existence of a human hearing mechanism to determine direction, could pass happily into extinction, its destiny fulfilled.

When the wavefront emitted by a sound-source, such as our extinct tiger, reaches the head, it is obvious that unless that source lies in the plane bisecting the head at right-angles to the ear axis, it will reach one ear before it reaches the other. The further away from the median plane the sound-source, the greater the interaural delay, until it reaches a maximum of around 0.7 ms, the time taken for sound to traverse the ear-ear distance—when the source is to one side along the ear-ear axis (fig.1). For transient-type signals the brain probably acts directly on this time delay to derive the directional information, but for a continuous waveform with a frequency below approximately 700Hz, for which the ear-ear distance represents a half-wavelength, the brain interprets the time delay as a phase difference between the signals picked up by the two ears (fig.2) and correlates this phase difference with direction. For higher frequencies, however, it can be seen (fig.3) that there is more than one direction which will appear to give the same interaural phase difference, and thus the detection of direction by phase correlation becomes ambiguous.

581Stereofig01.jpg

581Stereofig02.jpg

581Stereofig03.jpg

Above this critical frequency, fortunately, another mechanism starts to take over: the head increasingly casts an acoustic "shadow" when its size becomes of the order of, or larger, than the wavelength of the sound. The presence of this shadow means that an amplitude difference is introduced between the sounds perceived by the ears, enabling the brain to deduce the sound-source direction from the ratio of the two amplitudes. The pinnae further modify this amplitude difference with frequency, giving a direction-dependent spectral change which "sharpens up" the mechanism.

Obviously the amplitude and phase mechanisms will overlap over a range of frequencies dependent on head size, and reinforce each other until the frequency is such that the phase difference becomes totally ambiguous, apparently at around 1.2kHz for an average head. Above this frequency one has to rely on the amplitude mechanism alone for steady-state sounds, which has been shown to be less precise. However, if a transient occurs in an otherwise continuous high frequency waveform, then this is equivalent to dropping in an audio "marker" to give the brain some additional time delay information. The tiger treads on a stick and our caveman immediately has an unambiguous clue as to the tiger's direction, and lives to pass on the relevant hearing mechanism to his descendants. Without the transients, the brain has to try somehow to reinforce the weak amplitude difference clues and in fact the head is in constant slight motion, its side-to-side scanning enabling the brain to superimpose information about the rate-of-change of amplitude differences upon those same differences.

When the sound-source lies exactly on the median plane—the vertical central plane between the ears—all the primary mechanisms mentioned cause the brain to come to the same conclusion, ie, that the sound-source is dead central. Whether it is above or below, in front or behind, is somewhat harder to resolve, and the brain has to interpret the spectral information from the pinnae and secondary clues such as reverberation to determine this aspect. The brain is relatively good at "in front or behind?" decisions (provided the source isn't exactly on the median plane), but apparently no good at "above or below?" This isn't particularly important, as one doesn't normally depend on hearing alone, the eyes being the main source of such information.

Amplitude Stereo
The genius of Alan Blumlein lay in his recognition that if the interaural phase differences are reproduced as amplitude differences between the signals fed to two loudspeakers, this alone is sufficient to define direction completely, provided the listener is equidistant from the two loudspeakers. If the listener is not equidistant the resulting additional time delay give conflicting information, with confusing and ambiguous results. John Crabbe covered the subject of off-centre stereo listening in great detail in his series of "Broadening the Stereo Seat" articles (HFN/RR, June/July/September 1979), and to avoid unnecessary complexity the use of the word "stereo" throughout this article will imply "central listener" exclusively.

So, to precis Blumlein, for a central listener the perceived position of any sound-source can be represented by a precise ratio of the voltages fed to the two (identical) loudspeakers. If the voltages are equal, then we have the "double-mono" situation where the sound should appear to come from a point halfway between the two speakers. As a ratio is a dimensionless entity, the image produced by any such voltage-ratio should not occupy any space, but should be perceived as a point-source situated somewhere on the line joining the acoustic centres of the two speakers. Ideally, ignoring room effects, there would be no reason for the position of this point, or its lack of width, to change with frequency. As long as the program has been recorded in such a fashion that positions are faithfully represented by inter-channel voltage ratios—and there lies the rub—a central listener will perceive discrete images correctly positioned all the way along the line (actually an arc centred on the listener) joining the speakers.

Apart from a small percentage of human beings who can't be fooled by Blumlein's "amplitude for phase-differences" trick, two information channels can completely define a lateral stage, the second dimension—image depth—being provided by recorded reverberation, the brain automatically interpreting the presence of reverberation as evidence that a sound-source is further away. Whether this depth is subjectively convincing depends totally on the relationship between the recorded reverberation and the primary lateral images. Only a wavefront-sampling mike technique will preserve that relationship accurately, but more on that subject later.

Note the use of the word "completely." For a central listener, one has an absolute yardstick for assessing the quality of stereo imaging, without any reference to musical debate, the "real thing," direct/reverberant ratios, concert hall acoustics, the subjective experience, emotion quotient, or any other philosophical red herrings. Once we have our two information channels, as long as there is no crosstalk between channels—which will modify the voltage ratios—and provided the loudspeakers and their interactions with the listening room don't introduce any "widening" or "smearing" of the point images produced, then the sum of all those point images will form a continuum which accurately represents the recorded stereo image. As long as the narrow central image produced by a "double-mono" signal remains narrow and central at all frequencies, then the system must be inherently accurate as far as stereo is concerned. Any deficiencies then heard can only be related to the program. Likewise, philosophical discussions can then only apply to the manner in which the program was reduced to the two information channels, and the relationship of that program with the original live event, and not to the loudspeakers themselves.

Imaging Accuracy
Take, for instance, the argument put forward by Julian Hirsch in the October 1979 issue of Stereo Review. While agreeing that if a sound originates from a certain direction in space, then an ideal stereo recording would reserve that direction, he writes: "I do not experience this sort of definite localisation of sound when I attend a concert...I can usually tell if the source is at the right or left of the stage, or perhaps in the centre...Even when I have spotted the soloist visually, closing my eyes blurs his physical relationship to the rest of the orchestra."

Many writers have commented on this imaging problem in the concert hall, and although the degree of uncertainty varies according to the listener, it is nevertheless a real attribute of live sound. But to develop from this observation an argument that the ability of a loudspeaker to reproduce the point images discussed earlier is unnecessary, is spurious. For instance, to quote Hirsch again: "Often when I receive speakers for testing, the manufacturer emphasises the stereo-imaging qualities of his product...I cannot comment on these qualities—in most cases because I do not find their presence or absence to have much to do with how 'good' I find the speaker's sound to be. It is very easy to hear differences between speakers and many of them could probably be described as 'stereo-imaging' qualities. It is not easy to decide which, if any, of these qualities is the most accurate or realistic" (my italics). Hirsch goes on from there in another article (Stereo Review April 1980) to conclude that sonic imaging can only be a matter of individual preference.

ARTICLE CONTENTS

COMMENTS
Bogolu Haranath's picture

"Last days of the century" or "The thrill is gone" or still running "Against the wind" :-) ..........

Bogolu Haranath's picture

Excellent article ........... Still trying to digest all the information .............. May be I have to read it couple of more times ..........

Bogolu Haranath's picture

If you are serious about audio, sound production and re-production, this is one of the "must read" articles ..........

dalethorn's picture

"With the advent of cheap domestic digital playback systems in the near future, the 'joins' in that montage will be all the more apparent, and thus at last the consumer will be able to put real pressure on the record companies." -- I suppose we did in fact put some commercial pressure on them, but instead of converging to a more realistic image, we've split into different camps with different philosophies. The joy of digital as it were.

And it's not like headphones have made things better, except perhaps in remastering older recordings. In remastering you see, it's like oldies radio - you aren't going to hear all the crap they played along with the "better" stuff back then - you have the advantage of hearing just the "better" recordings hand selected for those playlists. Unless of course you're streaming, where you have to wade through the umm, "lesser" material.

Bogolu Haranath's picture

In my opinion, modern headphones (and in-ear 'phones) are saving high-end audio ......... They are lot less expensive compared to the loudspeaker based systems and lot more portable ...........

Glotz's picture

This will helpful for a lot of neophytes wanting to learn more about sound perception and further hammers home the need for exacting controls when listening critically.

Everyone should read this twice.

hifiluver's picture

Yes, Good article. When I first started buying sound equipment I visited this dealer who chose certain recordings and place the speakers heavily toed in to create a '3d hanging in the air' presentation. I thought it was magical and something to be attained, only to realise 2 decades later that this type of 'sound' is illusory and non existent in the natural world. Attending concerts (even the most amateurish ones)live music, amplified or otherwise helped put a frame of reference around expectations the next time the credit card came out.

Allen Fant's picture

Agreed,

Excellent article with a plethora of information to digest.
Plus, a photo of a dashing JA.

Bogolu Haranath's picture

Young JA looks like one of the members of the young Beatles .......... Now he looks like one of the members of ZZ Top, may be? :-) .............

spacehound's picture

He was doing his Julian Vereker impersonation.

spacehound's picture

I think John made it up.

soundhound's picture

Great article, yet like all such articles there is little in-depth analysis of the MS technique. This technique seems to have so much going for it, yet I never hear of it being used for classical recording. Is there some drawback to its practical use?

Bogolu Haranath's picture

Also, "Decca tree" type of recording technique is not mentioned ..............

John Atkinson's picture
Bogolu Haranath wrote:
Also, "Decca tree" type of recording technique is not mentioned

The Decca Tree is an variant on the 3 spaced omnis technique. It's a spaced pair of omnis with a center fill mike placed forward of the other two, See https://en.wikipedia.org/wiki/Decca_tree. I used it for Stereophile's Duet album: see www.stereophile.com/content/idueti-and-two-carry-your-soul-away-page-4.

John Atkinson
Editor, Stereophile

Bogolu Haranath's picture

"#that POWER" :-) ...........

Bogolu Haranath's picture

Some audio reviewers have advocated for 3 front speakers (instead of 2) for accurate imagining, including depth perception ......... That may work well for "Decca Tree" type of recordings, with 3 channel audio (like SACD, Hi-rez audio Blue Ray etc.) ........... That 3 speaker placement could be problematic, if we listen to other types of music recordings like Pop/Rock etc., where the recordings are 2 channel ........... We constantly have to move the speakers, for listening to other types of music ..........

Bogolu Haranath's picture

To add to the above ............. I am glad JA talks about "binaural" recordings ........... Binaural recordings seems like, they are having a resurgence in recent years, because of the popularity of headphones and in-ear phones ....... Also, modern DAWs can be helpful to compensate for recording deficiencies ..........

hollowman's picture

With the hair style, beard, microphones ... all in the context of recording engineering ... one would swear an uncanny resemblance to....

Bogolu Haranath's picture

JA did make some great recordings for Stereophile ..........

Bogolu Haranath's picture

JA mentions about sitting far back in the concert hall, for integration of sound ............ If someone sits too far back, they could have problem hearing the soft passages ...........

dalethorn's picture

Many of my better recordings have such a dynamic range that my listening location can't accomodate them until 2-4 AM. I see DR numbers all over the place, but the loudest to softest sounds in my recordings (those that are necessary to hear) must be 30-40 db different.

Bogolu Haranath's picture

May be JA could come up with an updated modern version of this same topic, and publish it in Stereophile ........... This essay is almost 40 years old ............

dalethorn's picture

Given that it's "Stereo Image" and modern stereo recordings are more than 60 years old (25 years older than this article), what could possibly need updated?

Bogolu Haranath's picture

Some of the techniques like M/S type (as one of the readers mentioned) and "Decca Tree" are not mentioned in this article ............ Also, modern DAW processing/recording is not covered ............

Bogolu Haranath's picture

To add to the above ............ Some of the older recordings could be re-recorded (with new and different artists) and re-mastered .............

dalethorn's picture

And would any of those efforts to "remaster the catalog" bear any resemblance to a 3-letter acronym that begins with 'M'?

Bogolu Haranath's picture

Rubinstein Nocturnes is a good example of what re-mastering can do ........

dalethorn's picture

But how much personal effort went into remastering Rubinstein versus remastering Radka Toneff? If you read the story on the latter, you'd see that they made significant improvements that justified the purchase to anyone who was vaguely interested. If the effort to remaster Rubinstein is not especially greater than the average MQA remastering - even though there are "clearly audible improvements" as the sales pitch typically goes, then those things you mentioned as justification will raise a huge wall of cynicism in the audiophile community.

What I'm saying in effect is, the real goodies we get are generally unrelated to those "M/S type, Decca Tree, DAW etc." issues. Not to diminish those things - just saying where the priorities likely are.

dalethorn's picture

Clarification -- not diminishing the Rubinstein remaster either. It was likely a labour of love. Those other things are technical, and would be part of an automation process, like MQA.

Bogolu Haranath's picture

True, true, true ........... In addition to all of the above, Iso-Mike recording technique is not covered ........ This article is 40 years old and IMHO, needs to be updated ...........

dalethorn's picture

The problem is, the article is already too big and complex now, and too daunting for newbies to look into. If the article were to be expanded to cover the important newer recording techniques and other relevant discoveries, it would be extremely large and indigestible even by AES standards.

So who will pay to turn this into a college-level course, which it needs to be? The reason I ask is because colleges and their curriculum don't generally serve audiophilia.

Bogolu Haranath's picture

I am sure you, me and some others on this forum will read it ........... We are all "dedicated audiophiles", aren't we? :-) ............

dalethorn's picture

See, the problem is much bigger than you suggest. Let me give you a real-life example. Many people whine and complain about social issues in our society, of which audiophilia is just a part. And audiophilia is based on principles, not just a set of facts. Now in the larger world, when people are wont to disagree on nearly everything, they appoint representatives to arbitrate their differences. And still, many (millions) are not willing to compromise their principles in order to move forward on day-to-day issues, and so those (who will not compromise) cannot be part of the arbitration processes.

So if our principles are more absolute than society at large (I believe they are), and we intend to maintain those principles as we move forward on our education, discovery, and enrichment of our hobby, we will need strong leadership to make those moves. Not only that, but a very strong commitment to that leadership by the vast majority of audiophiles. Does that sound like a cult? I can't say ..... but what I can say is without it, we will drift along exactly as we are doing now, and the standards will be determined by the most successful players.

dalethorn's picture

For example, J. Gordon Holt. There are other names, but until someone can take up his position as the erstwhile godfather of audiophilia, we the audiophile sheep will remain scattered.

Bogolu Haranath's picture

JA and other reviewers at Streophile are strong (cult) leaders :-) .........

dalethorn's picture

Leaders - plural sense of leader. Multiple leaders, multiple opinions.

Bogolu Haranath's picture

Ok ...... Let us all make Bob Stuart (of MQA fame) as our fearless leader :-) .......... He can convince and influence anybody :-) ...........

dalethorn's picture

You wouldn't want a supreme leader who is divisive, now would you? And I'm not suggesting for a moment that Bob would want to be divisive, but ..... choose the wrong leader and there goes your hobby.

david-p's picture

"In particular, use of a coincident technique, with its capture of the acoustic in which the musicians are performing, necessarily implies that the acoustic should be both suitable for the kind of music and attractive in its own right. This is rarely the case and, although ideally recordings should only be made in one of the apparently small number of good venues, commercial realities mean that the convenient location of a hall and its facilities often outweigh its total lack of a good acoustic. Walthamstow Town Hall, and All Saints, Tooting, for instance, are often used, yet the excessive wash of reverberation in such places—when empty of an audience-makes the live orchestral sound strange indeed, and not a sound that one would particularly want to record at all. The conductor also has a problem in hearing all of the orchestra!"

I worked on EMI recordings in both these places during the 1970s. Tooting was a special case, though some excellent recordings were made there; but I would dispute the above description of Walthamstow Town Hall. The Giulini recording of Verdi's Don Carlos demonstrates this. It was done in Walthamstow and no difficulties were encountered in making an excellent recording.

Nearly all the recordings I make today have a fig 8 stereo pair (or in the case of surround recordings an WXY ambisonic mic) as their basis. The exceptions are recordings of organ and other keyboard instruments, and choirs, where it is undesirable to have the ability to locate the position of individual strings, pipes or singers. In my opinion, for these cases omnidirectional mics work better, giving a pleasant stereo spread, with the expected blend.

Regarding "suitable acoustic", mentioned above, unless one is just intent on making a record of a performance in an acoustic over which one has no control (e.g. concert recording), it seems to me that in this day and age, when we are not short of older recordings of excellent quality covering most of the "classical" repertoire, I do not understand why anyone would elect to try to make a studio recording in an "unsuitable acoustic".

As far as making this topic into a "college course" is concerned, I did this, though more than one semester is needed, and taught it with great success over more than 20 years in the UK and USA. Many of my former students are well-known successful engineers and producers.

DougM's picture

JA states that perceived image depth is a function of perceived reverberation, but I would argue that it's also a function of tonal balance, in that a more prominent midrange will make an instrument sound closer to the listener, and a recessed midrange will make it sound more distant.

X