The 2011 Richard C. Heyser Memorial Lecture: "Where Did the Negative Frequencies Go?" Case Study 3: Digital Recording & Playback

Case Study 3: Digital Recording & Playback
The title of this lecture asks "Where did the negative frequencies go?" Once we enter the world of digital audio, they are very much present. Here is the spectrum of the music waveform I showed earlier:

And this is the spectrum of the same signal after it has been sampled in the time domain:

The positive (red) and negative (blue) spectra are mirrored around the sampling frequency and all of its harmonics, the latter extending to, if not infinity, then to something practically close to it. If you wish to play back time-sampled data, you need some way of eliminating all those spectral images other than the one in the baseband. Yes, a low-pass filter is required, but that filter turns out to have a very special function: it doesn't just remove the ultrasonic images, it reconstructs the original analog signal (below the Nyquist Frequency, that is, half the sample rate). The pulses representing the sampled amplitude are convolved with the impulse response of the filter to give the original signal, something that I found elegant in the extreme when I first understood it. That convolving is shown here in a diagram taken from John Watkinson's 1986 book on digital audio:

I still marvel at the elegance of this concept. But what if you don't use a reconstruction filter? The effect in the audioband is inconsequential—just a small rolloff in the top octave, due to the aperture effect (the pulses have a finite length).

NOS DAC with no reconstruction filter, frequency response at 44.1kHz sample rate

Above the audioband, the conventional reconstruction filter gives a well-behaved analog signal. Reproducing data representing an equal mix of 19 and 20kHz tones, you get a spectrum in which the inverted images of those tones—the negative frequencies—are well suppressed.

DAC with conventional reconstruction filter, spectrum of 19+20kHz tones with a peak level of 0dBFS

But, my goodness, when we repeat this measurement with a so-called NOS DAC (for Non-OverSampling), which has dispensed with the reconstruction filter, we get this:

NOS DAC with no reconstruction filter, spectrum of 19+20kHz tones with a peak level of 0dBFS

Ugh! There are the negative frequencies in all their glory, as well as a host of related aliasing and intermodulation products dumped back into the audioband.

So why do listeners like this mess? It can't be the aperture effect: –3dB at 20kHz is a subtle change at best. Some propose that it is the improved time-domain behavior of the system that the listeners are responding to . . .

NOS DAC with no reconstruction filter, impulse response

. . . compared with the impulse response of a conventional time-symmetrical FIR reconstruction filter:

DAC with conventional reconstruction filter, impulse response

Yet the differences between these two impulses all fall within the ear/brain's integration period. So unless people like the sound of their amplifiers misbehaving with the ultrasonic image energy, I have no idea what is going on here, other than to say that, whatever it is, it is not elegant.

An idea I did find elegant was Peter Craven's introduction of so-called "apodizing" reconstruction filters. Compare the conventional filter's impulse response above with the impulse response of a Craven apodizing filter:

DAC with minimum-phase, apodizing reconstruction filter, impulse response

The acausal ringing of the conventional filter of both the A/D and D/A converters has been replaced by a larger degree of causal ringing—it occurs after the event instead of before and after—at a slightly lower frequency. (The apodizing filter has a null at the original data's Nyquist Frequency.)

Again, people report that they prefer the sound of apodizing filters. A few years ago I published an article by Keith Howard in which he investigated the behavior of the reconstruction filter. As part of the preparation for that article, Keith sent me DVD-As of music treated with different filters. The recordings weren't identified, but Keith asked some of the magazine's writers to listen to the examples and rank them on sound quality. This was extraordinarily hard to do, but one difference did emerge as being consistently audible under blind conditions. When we were sent the key as to what filters had been used for each example, music reconstructed with the minimum-phase filter above sounded superior to music reconstructed with this filter:

DAC with acausal reconstruction filter, impulse response

Okay—the latter is nothing like we hear in nature. However, why does replacing acausal ringing at a frequency that people can't hear with causal ringing at a slightly lower frequency that people still can't hear result in better sound—er, sound that people tend to like more? Again, as Dick Heyser said, "there are a lot of loose ends!" (footnote 7)

Footnote 7: In subsequent conversations, I have been told that the ear/brain also acts as a wavefront arrival detector, that an acausal filter causes mental confusion as both the initial onset of the ringing and the arrival of the maximum energy peak are incorrectly interpreted as two separate events rather than one.

why is it so hard to accept that double-blind listening tests are difficult to achieve as JA has explained in his lecture?

the fact that our existences are commandeered by individual perception based on thousands of variables makes it very easy for me to understand, just as how one person may enjoy spicy foods but not grapefruit or where some may hear too much bass and others not enough. so many VARIABLES!!! culture, upbringing, what sounds you are surrounded by, traffic signals, your genetic structure, your actual physical position when listening. perception is a learned skill that we do not choose to accept, it just happens and it is different for every single person.

i think THESE are the sort of differences between individuals that make DBT difficult: everyone hears differently. there is no absolute sound.

the best example of how an ear and sonic preference can change is in the study of language and sounds. the chinese language has a completely different set of sounds to that of the english language, thus their speaking intonation, laughter, and music reflect their cultural and sonic inclinations. eastern and western and andean and greek and celtic and ... and ... all use completely different scales based on their preferences of sound learned over time through language and their environments.

Thus, i often wonder do hi-fi listeners across the globe prefer different sounding systems based on their installed sonic memory? or is there a constant in terms of preference across the globe? probably not. or even more interestingly, can one find similarities in preferences in sound based on linguistic sounds of an individual region? are the frequencies accented in the german language more easily noticed by a german in his hi-fi? DBTs are a waste of time. instead of focusing why not, it is much more fun to focus on the why.

the heart of all of this lies within JA's question: where do the negative frequencies go? there are aspects to our perception of sound that simply cannot be measured because they are based on individual perception which is different for every single one of us.

Ariel Bitran's picture

just wanted to let you know i haven't forgotten about you.

i've been south of the equator spending time with my father and brother, but now that I'm back in the Stereophile office, i'll answer your question in full a little later.

peace out homeslice.

be nice.

Ariel Bitran's picture

the two links provided earlier giving examples of some DBTs.

I found the matrixhifi test to be ignorable: who is the sample? how did they select these people? how are they representative of a population of listeners as a whole? in order to gather significance from these these tests, the first and most important step is determining your sample, sample size, and how you select your sample. this just seems like a bunch of friends having fun. also, since there were multiple components being switched at the same time, system synergies could have been the cause of the weaker sounding more 'hi-fi' system. maybe those components weren't right for each other, but the cheaper system just sounded better. at least in ABX, they only changed one piece at a time

what i found interesting in the ABX test was the user's ability to control the change of system component themselves. this helps eliminate the idea that the listener might feel like they are getting 'duped' or constantly searching/guessing for the difference.

Also regarding the ABX method, the # of times a difference was heard was 33. the # of times no difference was heard: 29. Interestingly, cables were the least discernable. 

DBT is time consuming and for signnificant results you need a large sample size (to represent a large population of listeners). With a small sample size, as in both of these tests, you risk a greatly flawed hypothesis and will lack confidence in your results. 

I don't want to whip out my textbooks, b/c i have other stuff to do, but 17 listeners is not nearly large enough of a sample size to even represent a population of 70,000 Stereophile readers (for example). Then we run into an even bigger problem of "who" is selected-->ie what type of sample you are trying to represent.

I've heard repeatedly that H/K does have a successful DBT model. I'm sure it takes them years to perform each experiment, and it is wildly expensive and time consuming. You need a large sample size for any of this stuff to matter, not a few dudes in a basement.

rl1856's picture

Do you like what you are hearing ?  If no, move on until you do.  If yes, then shut up and relax.  This is a hobby focused on the enjoyment of the creative output of artists.  It is not about how many proverbial angels can dance on the head of a pin.

Go listen to MUSIC !

hnipen's picture

Thanks John for a very interesting and exciting presentation, lots of interesting information here and I'm surprised, to say the least, from the lack of positive feedback.

There are many who are skeptical to some of the ways of doing measurements in Stereophile and in some ways I'm one of them too, especially the way speakers are measured so close, large array speakers like the bigger Dunlavy's and some others will not sum up very nicely in this way. We do, however, not live in a perfect world and Stereophile cannot afford an anechoic chamber, so this is probably the best they can do.

I wish John would share more of this kind of information as he has gathered lots of knowledge during a long interesting career at Stereophile and other places.

Go on John :-)

Merry Christmas

Cheers harald

absolutepitch's picture

John, thanks for getting this lecture pre-print available for us to read. I have been looking forward to this.

I agree that there is a lot fo information combined into one lecture that anyone would need a lot of time to learn and understand the details. Pardon me for paraphrasing some of your words below.

Regarding the null result of DBTs, your description of the interpretation is in agreement with what I remember from statistics classes. I might add that a statistician would include a probability value or confidence band with the interpretation (something to the effect that 'the null hypothesis of no-difference-detctable is accepted with high probability'), and equally for the case when a difference is detected with high probability. I personally think DBTs should be done for product reviews, but agree that valid DBT's are difficult and time consuming (expensive) to do correctly, as Dr. Toole has shown in his writings.

The example of the 'backwards' impulse being not agreeable to listeners is something I have noticed in reference to digital recording. It's a wave form that does not occur naturally in music production, so reproducing it should sound 'bothersome'.

I also agree with the previous post, that more articles like this would be welcomed, to further highlight how complicated this field really is.

bernardperu's picture

I have read your essay with great pleasure (all of it!) and I think it is a great example of the Liberal Arts and Science coming together. In the end, it feels like a piece of applied music philosophy, which I find fascinating. It also seems to be free of busines-oriented interests, as your opinion on cables clearly suggests. It is awesome and very unsual to meet an accomplished person who gives priority to his passions and principles over financial interests (as also expressed on your 2012 writing on the CES and Las Vegas). 

I consider myself to be an audiophile that turns off the lights and tries to connect his emotions with the music with a very relaxed mind (this seems to be a category in itself, as the un-relaxed passive listeners who cannot focus on the music on a mid to long term basis tend to be very opinionated). Having said this, I recently purchased a pair of Class D mono amps that can clearly connect me to the music (Hephaestus brand). I have not ever listened to amps which are over 15k. Within similar prices, class D seems to be the better choice (but how relative this can be, Jon!)

I will continue to follow your writings with deep admiration and I thank you for making a difference on my musical experience (which is passed on to my girlfriend and my child). 



hollowman's picture

Thx for posting this lecture, JA!!
I also saw the Wilkinson/HTG YouTube video/interview in which you summed up some highlights of the AES lecture.
A question about the non-oversampling vs. oversampling debate ... you (JA) noted that people (subjective listeners) prefer the NOS DAC.
Was there a formal test (or survey) conducted that you may be referring to?

As far as NOS vs OS ... this has been an especially active (and debated) topic in the DIY community.
Older DACs (multibit R2R, not Sigma-Delta or one-bit MASH/bitstream) are commonly used in various DIY projects/experiments. And, all else held equal (e.g., all one does is shunt the OS chip in an older CD player), many audio hobbyists do, indeed, prefer NOS -- but many also do not (the ones who prefer OS note that once clean, well-regulated power is provided to the digital-filter IC, you get much better sound).
My own "breadboard" experiments convinces me OS is better, tho' the NOS sound has some (few!) nice qualities not present in OS. This is almost like the age-old tube vs. SS debate.
I am aware of the very $$ Zanden DAC, and that's NOS, as are the modern AudioNote CD player and the HiFiMan 602. But almost all other commercial DAC (or digital sections) equipment since the late 1980s uses digital filtering (oversampling). And, I believe, OS has won out the mass-market and high-end/audiophile market for sound, important reasons (e.g., production costs vs. fidelity; or the all-important implementation issue: it's not just the DAC chip, but, also, decoder, pwr supply, output stage, etc.)
It's difficult to isolate the benefits/drawbacks of digital-filtering -- other than the shunting/bypassing method I described above -- due to various manuf/model designs using wide array of topologies, parts, etc. In other words ... Zanden (NOS) vs. Theta or Wadia (OS).

One final note ...
The 1st- and 2nd-generation CDPs were criticized in the audiophile community for having sub-std. sound. TTBOMK, these early players were all NOS.
Then came DF chips like Philips SAA7220 (famous for the faceplate moniker: "Fourfold Oversampling Digital Filter"), and commercial (read: not modded or souped-up Philips units from Meridian/Cambridge/etc.), started to gain acceptance by serious audiophiles/reviewers.
Examples of this included various Philips/Magnavox, Yamaha and (of course) Sony models sold in department stores/mail-order-outlets at hugely-discounted prices.

So what happened to improve the sound? Was it mostly/partly OS? And/or was it better DACs? And/or better (analog) output stages? And/or better overall topologies, tighter tolerances, better PCB layout, more careful PS design?

Some of the zero-oversampling fanboyism is strange (other than the one-more-time-parasite-for-the-bored-audiophile pathology). In DIY threads and modern eBay Chinese kits (or even complete D/A NOS processors, mostly from China) new-old-stock DACs (like TDA1541 or TDA1545/1387) are used with just a simple output section and a $2 USB decoder (or S/PDIF receiver) to get data from some source.

The image below is a top-down view of a complete D/A processor based on a TDA1387 (Philips multibit R2R DAC IC from mid 1990s) -- it's a non-(zero-)oversampling config. (but uses 8 TDA1387s in parallel).
It's from China and sells for less than $125 (eBay, Taobao, etc.). I have no idea how it sounds.
The main point, however, is that: this is a fairly new offering in the marketplace ... but it's all OLD-SCHOOL ... simple topology/layout, R2R DAC, etc ... many engineers were doing this type of stuff in the mid-1980s, and henceforth.
So why is old new again?! Does NOS, indeed, sound better because most audiophiles (esp. young people, but also older geeks/reviewers who sold/got rid of their "vintage" gear) have heard nothing but oversampling or Delta-Sigma digital?
These probing queries go right to the heart of the AES lecture. Indeed, the Wilkinson video interview on Home Theater Geeks had this title: Episode 84: What Is Reality?
What is Reality?

hollowman's picture

Another aspect of recorded/playback audio (and psychoacoustiics and related biology) is how human perception deals with:

-- NATURAL acoustic events, such as wind, leaves, birds, crickets, human voice
(IOW: random, short-duration, relatively QUIET events -- the type/duration of acoustics humans dealt with for most of the species' evolutionary history)

...vs. more "recent" acoustic exposures , such as ...

(these are much more continuous/linear, longer-duration, and louder events than what was statistically-signif. for most of hominid's evol. history).
With music, especially, there is so much more complexity (including the four std. music-theory qualities: harmony, tonality/timbre, rhythm, melody). Add words (song, opera, lyrics) .... and you have to, then, also engage the brain's symbolic/language interpreters ...hence, brain/mind acoustic organs are starting to get a real workout.
But let's not stop there: a good (= engaging) film with a complex orchestral score in the soundtrack makes things even more complicated -- i.e. the added visual-system integration! And the smell/flavor of popcorn and the sweetness/wetness/caffeine of that cola ...

I think the brain/mind can "learn" to cope with all this newer multi-tasking sensory environment. It (brain/mind) does seem to really like sensory "overload." Or movies and audio systems wouldn't be so popular.

To get to the point: Nature is a tough task-mistress. If you want to know her secrets, you've gotta have the passion for hard work (= scientific/empirical data colelction and careful analysis) AND passion for philosophical engagement. JA has done much of this in the 2011 AES lecture. Good work!

BTW: One (of the many, many) reasons classic objectivists' tests (like ABX) are flawed is that they (mostly) concentrate on short/excerpted chunks of music tracks.

Charleski's picture

You were setting up gear in the vestibule, and thus presumably moving around the area a bit. Is it not possible (indeed, highly probable) that in the short delay while they ‘demagnetized’ the LP you moved between a resonant null and resonant mode? Setting up the bass in small spaces can be challenging because moving one or two steps is often enough to produce a marked change in the resonances. We all know the vital importance of siting speakers correctly.

So yes, you heard a difference in bass. But no, this was probably the result of well-known acoustic factors and nothing to do with some kooky tweak.