Bits is Bits?

High-quality digital audio systems require that all digital interfaces in the signal path exhibit signal transparency. The widely adopted AES/EBU and S/PDIF interfaces have been criticized for a lack of signal transparency; here we (footnote 1) address possible problems with such interfaces and present methods for improving the interface standard.

In a correctly functioning (uniformly quantized and sampled) digital audio system, the only observable signal impairments should be attributable to band-limitation and an additive noise residue. Thus, although digital audio's subjective sound quality has been criticized since the launch of the Compact Disc medium 13 years ago, the theoretical performance obtainable from the 16-bit linear PCM format sampled at 44.1kHz is superb to any analog sources available to the consumer.

When correctly dithered using triangular PDF dither, a 16-bit digital audio signal possesses a dynamic range of 93.3dB, with zero distortion and zero noise modulation. The 16-bit format holds the possibility of even higher subjective dynamic range with minimally audible noise-shaping employed during CD mastering. Lipshitz et al (footnote 2) show that an increase in subjective dynamic range of up to 18dB is readily achievable when making the final truncation to 16 bits.

Since any practical digital audio system will err from this ideal performance, attempts are made to minimize measurable errors in digital components. For digital/analog converters (DACs), circuit-architecture advances including oversampling, noise-shaping, and 1-bit conversion result in greatly improved low-level resolution—the compact disc's theoretical performance can now be realized at a relatively low cost, at least upon replay.

In the quest for resolution, many "outboard" DAC units have appeared on the consumer market, with their sensitive D/A conversion process removed from the harsh electromagnetic environment inside the typical CD transport. Digital data is transmitted from the transport to the DAC along a coaxial or optical link (fig.1) in a serial format known as the Sony/Philips Digital Interface Format (S/PDIF). The S/PDIF standard is very similar to the AES/EBU format commonly used to interconnect professional digital components, and differs only in details, including transmission amplitude and subcode format. For much of this article, both interface standards will simply be referred to as the digital audio interface.

Fig.1 Two-box CD replay system with transport and DAC linked by S/PDIF digital audio interface.

Some users have reported subjective differences between various implementations of the interface. Peter Van Willenswaard (footnote 3) was among the first to note a change in outboard DAC sound quality when switching between different CD transport units; he linked this to measurable differences in interface signal risetime. Audio reviewers' claims concerning digital interface sound quality include differences between optical links and wired coaxial connections, and changes in sensitivity to interface quality depending on DAC architecture.

The digital audio interface standard
Is the digital audio interface flawed? Specifically, how can these claimed subjective differences occur in a digital data link? After all, "bits is bits."

The AES/EBU and S/PDIF digital interface standards use biphase-mark encoding to transmit two-channel audio data, synchronization information, and subcode data over a single serial information channel (footnote 4); this coding scheme allows clock information to be embedded in the serial datastream. Fig.2 shows the serial subframe structure consisting of 32-bit cells, each subframe carrying code for one audio channel.

Fig.2 Digital audio interface subframe format.

The subframe begins with a 4-bit synchronization signal "preamble" followed by a 4-bit auxiliary data block. Up to 20 bits of audio data can be transmitted, with LSB (least significant bit) first, and the MSB (most significant bit) occupying the last audio cell position. Finally, subcode information comprises validity, user, channel status, and parity bits.

The biphase-mark encoding technique places cell transitions at the beginning and end of each cell for "0" bits, and at the cell's beginning, midpoint, and end for "1" bits. The preamble violates this coding rule, so that interface receiver circuitry can detect when each subframe begins. If the audio-data sampling rate fs = 44.1kHz, then the cell (0) width is equal to 354 nanoseconds, while the half-cell (1) width is 177ns; hence, the maximum rate of transitions is equal to 1,000,000,000/177 = 5.65MHz, though harmonics of the interface signal will extend to far higher frequencies.

Fig.3 shows time-domain simulation of a single subframe carrying a left-channel audio sample of value 255, equal to 1111111100000000 in 16-bit, twos-complement notation with MSB last. The mid-cell transitions can be seen at each "1" bit position, while biphase-mark violation displaces local cell transition positions in the preamble.

Fig.3 Left-channel subframe with audio data word representing 255.

The biphase-mark signal can be transmitted using either a coaxial or optical connection, while the interface decoder at the receiver has to extract clock and audio data, and subcode information, from the serial datastream. The clock signal embedded in the serial datastream is usually used to control a phase-locked loop (PLL), which in turn should provide a stable reference frequency for conversion circuitry interfaced to the analog world. A number of dedicated Audio Digital Input Circuit (ADIC) integrated circuits now available will perform these functions.

The circuit in fig.4 uses the Philips SAA7274 ADIC; negative-going edges on the S/PDIF input signal are detected and compared to edges on the system clock derived from the PLL's 11.2896MHz crystal oscillator. A difference signal is fed to a varicap diode, which pulls the PLL oscillator frequency to match the clock frequency embedded in the incoming interface signal. The PLL has a first-order loop filter with a break frequency of approximately 1kHz, allowing clock recovery to reject short-term variations in the input frequency (ie, high-frequency jitter).

Fig.4 Experimental interface receiver circuit using Philips SAA7274 ADIC.

When the interface decoder supplies data to a DAC, the analog audio output will be corrupted if the samples are the wrong value (amplitude or "bit" errors), or are output at the wrong times (jitter).

Amplitude errors in the digital audio interface
The unfiltered digital-interface waveform is a binary signal whose transmitted information is determined by the transitions in the signal. One of the benefits of biphase-mark encoding is that the interface signal has only a small DC component—allowing interface signals to be AC-coupled, and edge detection to be performed using a comparator referenced to ground. If an audio data-cell transition is missed at the receiver, a bit error occurs, and a DAC connected to the receiver will output an incorrect sample value.

Footnote 1: Professor Malcolm Omar Hawksford is Director of the Centre for Audio Research and Engineering, Department of Electronic Systems Engineering, at England's University of Essex. Chris Dunn is a Research Officer at King's College, London. This work was funded by the UK's Science and Engineering Research Council, and was originally presented as a paper, "Is the AES/EBU S/PDIF digital audio interface flawed?" (Preprint 3360), at the 93rd Audio Engineering Society Convention, October 1992, in San Francisco. It is reproduced here with the kind permission of the AES.—John Atkinson

Footnote 2: S.P. Lipshitz, J. Vanderkooy, and R.A. Wannamaker, "Minimally Audible Noise Shaping," JAES, November 1991, Vol.39, pp.836-852.

Footnote 3: Peter van Willenswaard, Stereophile, November 1988, Vol.11 No.11, pp.51-53.

Footnote 4: AES3-1985, "AES Recommended Practice for Digital Audio Engineering—Serial Transmission Format for Linearly Represented Digital Audio Data," JAES, December 1985, Vol.33, pp.979-984.