Interviews

HDCD: Keith Johnson, Pflash Pflaumer, Michael Ritter Page 2

You therefore had to draw a line and say you've got a 40dB-dynamic-range system; I can't use the stuff that's below that. The problem is made worse if there's a good rim shot on a drum or a cymbal crash. A peak level on an orchestral session can be 110–120dB. The analog tape will saturate, but it only happens during a brief moment in time. You're not aware that it has happened, but a digital system will go crazy if you overdrive it. After the peak, you're back in the soft passages below the 40dB range where things don't work well.

The very thing the digital systems were touted as being very good at was the very thing they didn't do. It would do signal/noise ratio, but couldn't do dynamic range.

That was my observation of the things in digital that had to be fixed.

Pflaumer: I think it comes down to the fact that human hearing is very sensitive to very small details, even in the presence of large signals. The whole area is very difficult to measure properly. Just measuring the low-level performance of a converter, for instance, does not adequately show you what the converter does in the presence of a complex signal. But the ear is very sensitive to small things, even when they're riding in the midst of a very-large-scale signal. This happens when you have all kinds of instruments playing at the same time. The details of an oboe's reed sound, for example, get lost in the conventional digital recording because the details that distinguish the reed are very small compared to everything else that's going on. In conventional conversion, even with dither, the harmonic structure tends to get lost.

This may be a good time to mention some of the limitations of dither (footnote 4). There is certainly a prevalent misunderstanding these days that all you have to do is dither a 16-bit signal and you get enough resolution to do the trick. There are a lot of people who hold that view—or the view that dither with noise shaping ought to be enough. The problem is that dither is an averaging kind of phenomenon. You only get that resolution by averaging over time. Averaging works very well for lower frequencies in the musical spectrum. The problem is, of course, that it doesn't work very well for the higher harmonics of the signal.

Dither doesn't help instantaneous high-frequency components such as percussive edges—the little peaks a reed creates, or the very sharp, time-aligned compression peaks that a brass instrument makes. The resolution is not spectrally flat, and human hearing is able to detect these things. The human ear does not average things like the spectrum analyzer does. As a result, a lot of the delicate harmonic structure that gives rise to our sense of timbre is not properly preserved. Digital audio needs more bits, basically, than 16 real bits. You can't get the same thing by dithering [a 16-bit system].

We did a lot of experimenting with both dither and noise shaping as part of the evolution of HDCD. We did a lot of listening tests and implemented various classic curves that people publish for noise shaping. And basically, we didn't like what we heard very much. There was always a sense that something was a little bit unnatural about them. The lack of ability to preserve the harmonic structure of an instrument was part of it. The other part of it, we concluded, is that, in nature, a shaped noise floor doesn't exist. If you're in a hall listening to instruments, you don't have that kind of a funny spectral shape to the noise floor. Somehow, even though you can't hear it, there is still a sense that there's something not quite natural about it—like a pressure on your head. Even though you don't actually hear the noise floor, there's a sense that something's wrong.

Johnson: There's another factor, too. These schemes can use a significant amount of high-frequency energy that's focused at the extreme top end of the spectrum, and if it's played back on a cheap piece of electronics, like a boombox or something with integrated-circuit amplifiers, the TIM distortion generated in these things creates havoc. Quite often, some of these schemes that push dither to very high levels defeat the very purpose of the product. Inexpensive electronics can't play it back. Most high-end systems are much more tolerant of situations like that, and have relatively benign performance.

Harley: So HDCD uses no noise shaping?

Johnson: No, we don't use noise shaping.

Ritter: There's an important point to make here, and this noise-shaping discussion is a very good illustration of it. The team that invented and developed the HDCD process is an impeccable technical team that followed rigorous scientific and technical procedures. HDCD was definitely not the sort of thing that could be conceptualized or developed by a seat-of-the-pants, play-it-by-ear kind of approach. You couldn't get to where we are now just on the basis of how things sounded. There's some very serious scientific work involved with it. However, at the same time, throughout the development, how things sounded was very much a part of the equation. Even though we did very elaborate measurement test setups, at the end of the day the bottom line was, What did it sound like?

As Pflash pointed out, if you use conventional test equipment—spectrum analyzers or FFT machines—some of these noise-shaping approaches appear to have real benefits. However, when you have a controlled listening situation and very-high-resolution source material—Keith's first-generation analog masters—you can analyze these different technical approaches. After all the research we've done, it turns out that human hearing is far more sensitive than any measurement device—even the latest test equipment we have.

We couldn't have gotten to where we are today if it wasn't for a combination of technical expertise, scientific background and approach, an extreme degree of conversance with live and reproduced sound, and that high-resolution source material. There was a synergy between these to the point where I would say that we could not have achieved what we have without all those elements being brought together and applied rigorously over a period of years.

Harley: Keith's experience hearing the orchestra live, then the microphone feed, then what digital did to the signal, must have been a great asset.

Ritter: Exactly. We had those references. We knew what was possible. And then we had the mental horsepower to deal with solving the problems.

Harley: When did you first start working seriously on HDCD?

Ritter: It was in the spring of '86 that Keith first described some of the concepts to me. And then as Pflash got with Keith, the synergy started. It's quite an amazing thing. Not only do their talents complement one another, but the level of their talents also complement too. Both of these guys are brilliant, and used to a situation where they're a couple of years ahead of everyone else on whatever they're working on.

I did the other steps of getting the capital together and forming the business. We incorporated Pacific Microsonics in November, 1986.

Pflaumer: To be fair, we were all only working on it part-time in '86 and through about '89. I was still very much involved with Tops up through '89, so I was only able to tear myself away for a couple of weekends a month to confer with Keith and try to make the HDCD ideas gel.

Johnson: At that time, we were simulating the HDCD concepts in the analog domain. We would take one part of the system, isolate it, and then build a processor or whatever was necessary to develop one little piece. When Pflash came on board full-time, it was fortuitous timing; Pflash's knowledge and experience in digital signal processing, along with the availability of powerful DSP, gave us the opportunity to take it to the next level.

Ritter: We attempted to emulate some of this stuff on the most powerful computer we could find at the time, which was a very expensive Sun workstation. It ended up that we needed about eight times the computing power of this RISC-based workstation to run the HDCD encoding algorithms in real time!

After '89, there was a much higher level of development activity going on. Keith was working on better and more elaborate implementations of the A/D. There was a huge amount to do, because everything that we were doing was just beyond what anybody had done before. There were no off-the-shelf solutions.

Pflaumer: Before we could get into a realistic, real-time implementation of HDCD as a process, we wanted to build an A/D and a D/A which satisfied us—in terms of the basic quality limitation of conventional coding with more bits and a higher sampling rate—without worrying about trying to fit it through a 16-bit, 44.1kHz pipeline. In other words, we needed to put an A/D converter and D/A converter back to back to set the quality level we were working toward.

We first built an A/D and D/A that sounded pretty good, then implemented the various concepts that had been discussed as to what HDCD should be. We then implemented those ideas in a digital form and squeezed that into a 16-bit, 44.1kHz signal we could record.

For the first time we were able to do all of the things that we thought that we should do and had previously simulated in the analog domain. We could do them all simultaneously, in real time, and were able to process real audio signals and get the kind of quality we were striving for through a conventional 16-bit, 44.1kHz channel.

Ritter: That was a pretty exciting time, because there had been years of slogging with nothing to listen to. When we first started getting the thing working, it was way beyond any other kind of digital. It wasn't as good as it is today by any means, but the thing was working. It was very exciting.

Harley: Specifically, what's going on in the encode process that we have been discussing?

Ritter: For a variety of reasons, we can reveal only so much. It's important to reiterate here that HDCD is a holistic system, meaning that it addresses all areas of digital recording and reproduction. It has to. If you just say, "We make things better by doing x, y, and z," it doesn't begin to address the overall problem that we're confronted with. Therefore, the process itself wraps around the A/D and D/A conversion and is integral to it.

In that context, HDCD begins with an extremely high-quality, proprietary A/D converter, which is arguably better than any other converter that we're aware of in any form. It's not a little better; it's a lot better. It's better in terms of distortion generated in the conversion process, and it also has a very high degree of resolution. It has wider dynamic range, extended frequency-domain response, more bits than 16, and a much higher sampling frequency than 44.1kHz.

The signal that we get from the A/D conversion has far too much information to record or store. This signal has all that information and very low distortion at the same time. That signal is then analyzed using DSP techniques in real time. The algorithms that look at the signal are algorithms that were derived from our research into psychoacoustics and auditory physiology. We were concerned with how we hear mechanically in addition to how we hear subjectively. Those algorithms look at this high-definition signal and determine the components in the signal that would not fit in the normal 16-bit, 44.1kHz recording—signal components that are important in terms of how we perceive subjectively and also how we perceive objectively. The high-resolution signal is then decimated to a 16-bit, 44.1kHz signal that can be recorded on a compact disc, but with the additional information about the psychoacoustically important signal components added in.

The additional information is added in two fashions. Part of it goes into the linear PCM signal itself in a way that can be reproduced to a certain extent with standard playback equipment. You can hear some of this improvement and some of this additional information on standard playback equipment. Second, additional information goes into a buried control channel in the LSB [the 16th and least significant bit of each 16-bit audio sample]. The buried control channel doesn't occupy the entire LSB; it's done in a very clever fashion, occupying only a very small percentage of it. This is Pflash's work here.

Pflaumer: The encrypted control channel shares the LSB with the LSB of the music. One of the key ideas here is that the additional information is not needed in a steady-state fashion. There are certain times in the program material when you need to provide much more information than at other times. The side channel can share the LSB with program material, and gets inserted as needed. The decoder is watching for its presence and picks out commands as they're sent across through the channel.

Ritter: It's quite amazing—almost like something for nothing. You literally have this additional information sufficient to reconstruct the original high-resolution signal. However, we don't do it by taking away any resolution in the non-decoded playback. Essentially, there is no loss. On average, the additional information uses only one to five percent of one bit.

Harley: But is that a high enough data rate to transmit the reconstruction information?

Johnson: Yes, it is. In the encoder, we determine the process which works best, then send the information of which process was used down the control channel. On the playback side, the decoder says, "Ah, that's the process I need to perform to be the conjugate to the encode process."

All it needs to be is a number that says, "This is the process to perform." In the meantime you've got a powerhouse—the decoder—at the other end that's not creating information, but is programmed to do some fairly complicated activity.

Pflaumer: You can send with brute force lots more information in that channel, which may be desirable at certain times. But for the most part, because HDCD is a process that has anticipated the requirements under different conditions, the encoder can pick the appropriate process based on the analysis of the signal, and simultaneously tell the decoder which process it's picking. The decoder knows how to complement that operation. As a result, HDCD provides the equivalent of a lot more information without having to have the bandwidth.

Harley: You transmit the command to perform the restorative operation rather than the operation itself.

Johnson: Exactly. You could, for example, send a code that says, "Output Beethoven's Ninth." Because the receiving end has Beethoven's Ninth in memory, I can play back Beethoven's Ninth with just a few bits. It's an extreme case, but it makes a point about how HDCD works. It's a powerful technique that you could never do in the old analog system.

Harley: Does the presence of this encrypted channel degrade the fidelity when played undecoded?

Ritter: No. That's why HDCD discs sound so good on standard playback. We're not throwing away anything.

Harley: What types of musical information that would get lost in the 44.1kHz, 16-bit bottleneck do you encode in the control channel and then reconstruct on playback?

Ritter: Timbral information, hall ambience, low-level information that gives you accurate timbral reproduction of instruments and voices. The additional information also preserves spatial cues.

Johnson: A lot of the things that you can't get by dithering. We developed other ways than dithering to preserve those things. A lot of what we do is looking at the signal on an instantaneous basis—what a spectrum analyzer doesn't do.

Ritter: That's an important point. HDCD preserves instantaneous information in the signal, in terms of what's perceived as frequency extension.

Harley: By perceived frequency extension, I understand that to mean that HDCD can preserve components in the signal that trigger a response in human hearing equivalent to a frequency response extending beyond 20kHz.

Footnote 4: Dither is a small amount of noise—either broadband white noise or narrow-band noise—added to the signal to randomize quantization error, allow the digital system to resolve information lower in amplitude than the least significant bit, and make digital encoding more linear.—Robert Harley

Interviews

HDCD: Keith Johnson, Pflash Pflaumer, Michael Ritter Page 2

ARTICLE CONTENTS

ArtIcle Contents