Contingent Dither

If there is one thing I've learned in almost 28 years (ouch) of audio writing, it's that audience reaction is fickle. Sometimes readers will swallow the most contentious pronouncements without indigestion, only to choke on throwaway lines you've invested with little importance. It just goes to confirm that human communication involves senders and receivers, and they aren't always in synchrony.

I was pretty certain, though, when I'd dotted the last i and crossed the last t of a piece for Hi-Fi News last October (published in the February 2005 issue), that it would elicit howls of protest. What the article suggested was that (a) it is often unnecessary to apply dither when requantizing 24-bit recordings to 16-bit resolution, and (b) if you do add dither unnecessarily, then it has an adverse effect on sound quality. On the audio-adapted Richter scale, I reckoned these to rate at around 6 and 8 respectively, and was ready to run to the nearest door frame when the ground began to shake (footnote 1). In fact, there had already been one major tremor: in private e-mail exchanges with respected UK independent recording engineer Tony Faulkner, he let me know with characteristic forthrightness that his experience said I was wrong, wrong, wrong.

In the event, he was the only one to berate me to my face. Others may have seethed, but they did so privately. Thus encouraged, and at John Atkinson's invitation, I am going to repeat the heresy here, and update you on its latest developments. That I should escape a mauling a second time seems unlikely.

This story began around the time I last mentioned Tony Faulkner's name in these pages, in the course of preparing "The Law of Averages" (Stereophile, January 2004). He had provided me with 24-bit WAV files of two of his orchestral recordings, captured at 176.4kHz sampling rate, to experiment for myself with the adjacent sampling averaging technique he uses for downsampling to 44.1kHz for CD release—a controversial topic in its own right. For interest, I converted one of these recordings to 16-bit both with and without redithering, and burned the two files, together with the 24-bit original, to DVD-R for comparison (using Minnetonka Audio's discWelder Chrome DVD-Audio authoring software). Note that there was no downsampling applied—the original 176.4kHz sampling rate was retained in all cases.

I anticipated two possible outcomes. Either there would be sufficient inherent noise in the recording to render redithering unnecessary in the conversion to 16-bit, in which case the truncated and dithered versions would sound much the same; or there wouldn't be sufficient noise in the recording, in which case the quantization error resulting from nondithered truncation would sometimes be objectionable, rendering the undithered version sonically unacceptable. I reckoned the latter result the more probable.

You can guess what I'm going to say next: It didn't turn out that way. With the 24-bit original as the reference, I preferred the sound of the undithered 16-bit version. It wasn't quite as open and airy as the 24-bit source, but it had much the same spatial and dynamic feel overall, while the dithered track sounded less open, less expressive, more CD-like. As a reality check, I took the disc to an evening listening session at the home of Max Townshend of Townshend Audio, where he and my Hi-Fi News colleague Ivor Humphreys, without any coaching, also expressed a preference for the undithered version, and the same surprise at being told which it was.

By then I had written a software utility to extract and amplify the quantization error that results from truncating a 24-bit file to 16-bit without dither, pointing it first at the Faulkner recording and then at a handful of 24/96 tracks I have on my computer's hard disk, culled from those music DVD-Vs and DVD-As in my collection that offer an unsullied 24/96 bitstream via the S/PDIF output of a suitable player (in my case, a Pioneer DV-939A). Listening to the resulting error files proved mostly an exercise in enduring what sounded like random, white-spectrum noise. Only very occasionally could something untoward be heard, typically either a low-frequency wump or a much more obvious graunch, indicating that the quantization error was, for those brief periods, correlated with the signal. Mostly these episodes occurred either at the beginning or end of a track, where the gain was being ramped up before the start of the music or down again at its end.

This finding suggested that only during these fades was there insufficient inherent noise in these recordings to provide effective dither at the 16-bit level, while for the remainder of the track microphone and other noise was present at sufficient amplitude to obviate the need for redithering during conversion to 16-bit. Spectrum-analyzing the background noise from one of the tracks (Sara K.'s "Brick House"), which I excised from the short gap between the gain being fully raised and the music starting, confirmed this (fig.1). Comparison with the equivalent spectrum for 16-bit triangular probability density function (TPDF) dither at the optimum amplitude of 2LSB peak–peak shows that the inherent noise in the recording is at a significantly higher level, and therefore likely to provide effective dithering. To be certain of this, we have also to ascertain the noise waveform's probability density function (PDF), although it would have to possess a most unlikely PDF for it not to be an effective dither at this amplitude. In fact, as fig.2 shows, the noise has what looks to be a normal or Gaussian PDF, as you would expect of what is probably predominantly microphone noise.

Fig.1 Noise spectrum from the beginning of Sara K.'s "Brick House" (blue trace) with that of 16-bit TPDF dither (red) for comparison. (Sampling rate 96kHz and 8192-point FFT in both cases.)

Fig.2 Histogram of sample amplitudes within the "Brick House" noise (high-pass filtered at 5kHz to remove studio background) shows it to have a Gaussian-like probability density function.

A back-of-an-envelope calculation shows that this outcome is not so surprising as it may first appear, particularly in the case of purist recordings where microphones are positioned some distance from the performers so as to capture the contribution of the room acoustic. Microphone noise is conventionally specified as an equivalent sound pressure level (SPL), a figure of 15dBA SPL being typical for high-quality capacitor microphones, which is roughly equivalent to 17dB SPL without A-weighting. If we add this figure to the 93.7dB signal/noise ratio of optimally TPDF-dithered 16-bit, this suggests that the recording's peak level must correspond to 110dB SPL or more for the mike noise to fall below the required dither level. In other words, roughly this SPL is required at the microphone for dithering at the 16-bit level to become necessary. (This is an oversimplification that takes no account of the PDF of the mike noise, but it puts us in the right ballpark.) In purist recordings of smaller-scale music, this SPL may well not be achieved.

This was about as far as I'd progressed before writing the HFN article. I was and remain convinced that careful listening to the quantization error is the most sensitive and relevant test of its randomness (or otherwise), and the need (or otherwise) to apply dither. But it was clear that a more formal means of testing for randomness would provide a reassuring cross-check and harder evidence of my contention that 24-bit recordings can often be converted to 16-bit without the need for additional dither. (My second claim, that the addition of unnecessary dither can harm sound quality, is of course something that can be judged only by listening.)

A standard test of randomness in time-series data—which is what a digital audio signal comprises—is the autocorrelation function. What this does is compare the signal with a delayed version of itself, the result being a correlation coefficient that can range between +1 and –1, these two extremes indicating perfect in-phase and perfect anti-phase correlation, respectively (ie, the two signals are identical). By contrast, a correlation coefficient close to zero indicates that there is very little similarity.

A simple "lag 1" autocorrelation test is commonly used to test for randomness, wherein the signal (or other time series) is compared with itself using a delay of a single sampling period. If the autocorrelation coefficient for this lag falls within a statistically determined band around zero, then the signal is presumed to be random; if it falls outside this limit, the signal is presumed to have some structure. More complex tests (such as the Box-Ljung) are available which use a larger number of autocorrelation lags to make the call of "random" or "not-random."

Footnote 1: For my own musings on this subject, see "As We See It" in the April 1996 issue.—John Atkinson