Columns Retired Columns & Blogs |
MQA: Questions and Answers System Noise Questions
System Noise Questions
Footnote 1: Examples here and here.
Some questions have been posted in on-line comments and elsewhere:
a) MQA made the unprecedented step of displaying the signal-to-noise (S/N) ratio of all of the charts and graphs not in "dBFS" (dB below Full Scale), but instead by presenting all of their data in "noise per-root Hz dBFS". An example of this was in the original AES Journal article [2]. In this paper, Figure 8 shows that 16 bits of data yields a 144dB noise floor and that 24 bits of data yields a 192dB noise floor. I personally am unaware of any scholarly article concerning digital audio that has used this approach.
b) In the original paper Stuart and Craven claimed that 16-bit digital audio yielded a quantization noise floor of 144dBFS per-root Hz and that 24-bit digital audio yielded a quantization noise floor of 216dBFS per-root Hz. This is a full gain of 48dB over standardly accepted digital audio practice where it widely agreed that 16 bits yields roughly 96dB S/N and 24 bits yields roughly 144dB S/N
c) Eventually Stuart and Craven changed their presentation. Now they only claim a 24dBFS advantage by using "noise per-root Hz" in their presentation. All of their current presentation claims that 16-bit audio data yields 120dBFS noise per-root Hz and 24 bits yields 168dBFS noise per root Hz.
d) Still this is a 4-bit advantage over any other digital audio presentation I've ever seen in anything regarding digital audio.
Answers: As we shall show, this question is technically incorrect on each and every specific pointthe questioner's incredulity arises from a misunderstanding of the way we have measured and plotted the data and the graphs we have plotted are in fact precisely correct. We provide quick answers first but because the topic is fundamental to audio we are including a tutorial answer. The question is really about how to present scientific data, not a comment on the operation or performance of MQA itself. Readers who do not wish to read the following discussion are invited to skip to the section on temporal precision.
a) This step is not unprecedented. The type of analysis we use in our papers is normal in various areas of electronics and instrumentation physics; in a system as complex as MQA, it is essential for efficient analogue and digital design.
b) The "96dB" and the "144dB" relate to different quantities: the first relates to total noise power and the second to spectral density ("noise per-root-Hz"). The numbers are thus different because we are measuring different things. Similarly the "24dB dBFS" relates to a yet different measurement unit (noise-per-spectral-bin," as will be explained in the tutorial).
c) The questioner is interpreting a 24dB difference as a "4-bit advantage." But, as above, the "24dB" is a decibel difference between the numerical values of two physically distinct quantities (consider that one would not make a direct comparison between 'miles' and 'miles-per-hour'). What matters is the comparison between the noise from MQA and the noise from a 16-bit or a 24-bit floor taken as reference. We are always careful to plot both curves, so that one can see at-a-glance where MQA lies relative to the noise of conventional PCM digital audio.
Tutorial: Noise in electronics
Noise is an unwanted signal whose origin can be thermal, generated or from some other disturbance (eg, Brownian motion). In electronics, we can start with basic resistor noise. The amount of thermal noise we measure depends, inter alia, on the bandwidth and spectrum of the noise. As it happens Johnson noise has a white spectrum and so the noise power is broadly proportional to bandwidth. Therefore noise voltage is proportional to √(Bandwidth). This is described well here.
Understanding how to estimate noise spectra is important where the noise is not spectrally white (eg, the pink characteristic of 1/f or shot noise in transistors or op-amps) or where the receiver has a non-uniform sensitivity with frequency (such as the human listener). Such noise can be expressed in volts or amps-per-root-Hz (eg, nV/√Hz) at different frequencies (footnote 1).
The measurement we often use to evaluate noise is its density in a 1Hz bandwidth, a convention that is completely standard in electronics and instrument physics (footnote 2). Why?, because we are talking not about "total noise" but "noise spectral density", which might be different at different audio frequenciesand you need to know about that if you are to decide whether you can hear it or not.
Noise in a Digital Channel
In a digital channel the dynamic range is limited by quantisation noise. A rough estimate of dynamic range is 6.02 x N dB (where N = number of bits).
More precisely the dynamic range of an undithered system is a little higher at 98.09dB in the 16-bit case. In an LPCM channel using TPDF dither, the dynamic range is reduced by 4.77dB (to 93.32dB in the 16-bit case so long as the quantisation error and dither have a white spectrum).
If the noise spectrum is white, then the power is spread uniformly throughout the Nyquist bandwidth. In any arbitrary fixed-width band, the noise power will be constant and we can obtain different measures of the noise according to bandwidth and/or weighting function.
A 1Hz bandwidth is used to describe noise-spectral density (NSD). This measure is helpful when dealing with shaped noise, hearing thresholds and, importantly, in a Shannon diagram where using per-root-Hz analysis allows us to represent signal area as data rate.
Most audio texts have in the past spared the reader the need to grapple with the distinction between total noise and noise spectral density. It has generally been adequate to quote total noise, or bit-depth, and accompany this with a graph showing the shape of any noise shaping that might be used. This approach breaks down however if there is more than one sample rate or the human listener involved.
If a 16-bit TPDF channel is running at 44.1kHz then: NSD (noise-spectral-density) = 93.3210 log(22050) = 136.76dBFS per-root Hz. (or 140.1dBFS at 96kHz or 143.1dBFS at 192kHz, etc.).
16-bit noise in a channel running at 192kHz will be 6dB lower than in one running at 48kHz because that same total noise is spread over a wider bandwidth (and most of it will lie outside the 020kHz conventional audio range). So it is not enough to quote total noise.
Naïve interpretation of NSD might suggest an over-low threshold of audibility for low-level tones. In fact our ear integrates noise around each frequency according to a bandwidth that is a non-linear function of both centre-frequency and level. Nevertheless we can detect single tones lower than 96dBFS in a 16-bit channel. [12]
The concepts of 'noise spectral density' and of measurement within a 1Hz bandwidth and estimating audibility are to be seen in Stuart's peer-reviewed papers from 1972, 1994 and 2004. [11][12][13][16][17]
Given these precedents (and there are numerous others) it would clearly be incorrect to imply that these concepts have been invented to obfuscate the descriptions of MQA.
So the specific answer to the question is, we plotted the noise-spectral density of a TPDF-dithered 16-bit 192kHz channel at 143.1dBFS because that is indeed the right answer. This is not an attempt to mislead but more at competent peerpeer communication.
Occasionally other measurement bandwidths can be useful to represent different impacts of a noise on a signal, eg:
1/3 octave (where measurement bandwidth is proportional to frequency)
ERB (equivalent rectangular noise bandwidth) weighting (where the measurement bandwidth follows the apparent (equivalent rectangular) noise bandwidth of the human earthis is useful to estimate noise spectra vs tone thresholds). (See Figure 1).
FFT analysis as a rapid but cruder approximation to tone vs noise comparison.
In an FFT analysis the noise level we see depends on: i) the number of bins; ii) the windowing used in the analysis; iii) the size of the display bins. So, eg, at 44.1kHz, a 2048-point FFT has a bin width of 44100/2048 = 21.5332Hz. A Kaiser 140 window (used in Adobe Audition) needs a correction of 3.29dB (footnote 3).
In summary: We use the 1Hz measure in Information (Shannon) diagrams because we want to visualise information content of the signal. (eg, as in [2]).
Although 1Hz bandwidth analysis gives us the correct areas on a Shannon diagram, it is less useful for comparing tonal signals with noise. ERB weighting can indicate why we can detect signals lower than 96dBFS in a 16-bit channel.
In some white papers we have used FFT analysis with 21.5Hz bin-width (where the level is offset by 13.33 + 3.29 = 16.62dB) as this can give reasonable agreement between noise and tonal signals at low frequencies without the complexity of explaining auditory modelling. This was the case in the examples cited in part e) of the question.
As before, the level we show is precisely correct.
There is no misleading here. Furthermore, since measuring bandwidth is a key parameter we almost always draw the corresponding 16-bit channel reference level on plots for clarity and footnote the measuring bandwidth.
We'd suggest the following reading, even though some of it is from 25 years ago. [12][11][13]
Footnote 1: Examples here and here.
Footnote 2: In audio, we tend to talk about dB rather than nanovolts. To be pernickety, "dB per-root-Hz" is a shorthand for "signal amplitude in a 1Hz bandwidth expressed as a decibel value". The "per" does not mean exactly what it says because the decibel value does not double when the measurement bandwidth doubles.
Footnote 3: See the discussion here.