Computer Audio Reviews

MP3 vs AAC vs FLAC vs CD

John Atkinson Mar 08, 2008

As Wes Phillips recently reported on this website, CD sales are down and legal downloads of audio files are up. Stereophile has been criticized more than once for not paying enough attention to the subjects of MP3 and other compressed file formats, such as AAC, and for offering no guidance at all to readers about how to get the best sound quality from compressed downloads.

These criticisms are correct. We don't.

The reason is simple: Although they are universally described in the mainstream press as being of "CD quality," MP3s and their lossy-compressed ilk do not offer sufficient audio quality for serious music listening. This is not true of lossless-compressed formats such as FLAC, ALAC, and WMA lossless—in fact, it was the release of iTunes 4.5, in late 2003, which allowed iPods to play lossless files, that led us to welcome the ubiquitous Apple player to the world of high-end audio. But lossy files achieve their conveniently small size by discarding too much of the music to be worth considering.

In the past, we have discussed at length the reasons for our dismissal of MP3 and other lossy formats, but recent articles in the mainstream press promoting MP3 (examined in Michael Fremer's "The Swiftboating of Audiophiles") make the subject worth re-examining.

Lossless vs Lossy
The file containing a typical three-minute song on a CD is 30–40 megabytes in size. A 4-gigabyte iPod could therefore contain just 130 or so songs—say, only nine CDs' worth. To pack a useful number of songs onto the player's drive or into its memory, some kind of data compression needs to be used to reduce the size of the files. This will also usefully reduce the time it takes to download the song.

Lossless compression is benign in its effect on the music. It is akin to LHA or WinZip computer data crunchers in packing the data more efficiently on the disk, but the data you read out are the same as went in. The primary difference between lossless compression for computer data and for audio is that the latter permits random access within the file. (If you had to wait to unZip the complete 400MB file of a CD's content before you could play it, you would rapidly abandon the whole idea.) You can get reduction in file size to 40–60% of the original with lossless compression—the performance of various lossless codecs is compared here and here—but that increases the capacity of a 4GB iPod to only 300 songs, or 20 CDs' worth of music. More compression is necessary.

The MP3 codec (for COder/DECoder) was developed at the end of the 1980s and adopted as a standard in 1991. As typically used, it reduces the file size for an audio song by a factor of 10; eg, a song that takes up 30MB on a CD takes up only 3MB as an MP3 file. Not only does the 4GB iPod now hold well over 1000 songs, each song takes less than 10 seconds to download on a typical home's high-speed Internet connection.

But you don't get something for nothing. The MP3 codec, and others that achieve similar reductions in file size, are "lossy"; ie, of necessity they eliminate some of the musical information. The degree of this degradation depends on the data rate. Less bits always equals less music.

As a CD plays, the two channels of audio data (not including overhead) are pulled off the disc at a rate of just over 1400 kilobits per second. A typical MP3 plays at less than a tenth that rate, at 128kbps. To achieve that massive reduction in data, the MP3 coder splits the continuous musical waveform into discrete time chunks and, using Transform analysis, examines the spectral content of each chunk. Assumptions are made by the codec's designers, on the basis of psychoacoustic theory, about what information can be safely discarded. Quiet sounds with a similar spectrum to loud sounds in the same time window are discarded, as are quiet sounds that are immediately followed or preceded by loud sounds. And, as I wrote in the February 2008 "As We See It," because the music must be broken into chunks for the codec to do its work, transient information can get smeared across chunk boundaries.

Will the listener miss what has been removed? Will the smearing of transient information be large enough to mess with the music's meaning? As I wrote in a July 1994 essay, "if these algorithms have been properly implemented with the right psycho-acoustic assumptions, the musical information represented by the lost data will not be missed by most listeners.

"That's a mighty big 'if.'"

And while lossy codecs differ in the assumptions made by their designers, all of them discard—permanently—real musical information that would have been audible to some listeners with some kinds of music played through some systems. These codecs are not, in the jargon, "transparent," as can be demonstrated in listening tests (footnote 1).

So to us at Stereophile, the question of which lossy codec is "the best" is moot. We recommend that, for serious listening, our readers use uncompressed audio file formats, such as WAV or AIF—or, if file size is an issue because of limited hard-drive space, use a lossless format such as FLAC or ALC. These will be audibly transparent to all listeners at all times with all kinds of music through all systems.

Putting Codecs to the Test
Do I have any evidence for that emphatic statement?

For an article published in the March 1995 issue of Stereophile, I measured the early PASC, DTS, and ATRAC lossy codecs and put four of the test signals used for that article on our Test CD 3 (Stereophile STPH006-2). For the present article, I used two of those signals, tracks 25 and 26 on Test CD 3. But first, to set a basis for comparison, I used that most familiar of test signals: a 1kHz tone.

The spectrum of this tone, played back from CD, is shown in fig.1. The tone is the sharply defined vertical green line at the left of the graph. There are no other vertical lines present, meaning that the tone is completely free from distortion. Across the bottom of the graph, the fuzzy green trace shows that the background noise is uniformly spread out across the audioband, up to the 22kHz limit of the CD medium. This noise results from the 16-bit Linear Pulse Code Modulation (LPCM) encoding used by the CD medium. Each frequency component of the noise lies around 132dB below peak level; if these are added mathematically, they give the familiar 96dB signal/noise ratio that you see in CD-player specifications.

Fig.1 Spectrum of 1kHz sinewave at –10dBFS, 16-bit linear PCM encoding (linear frequency scale, 10dB/vertical div.).

Fig.2 shows the spectrum of this tone after it has been converted to an MP3 at a constant bit rate of 128kbps. (The MP3 codec I used for this and all the other tests was the Fraunhöfer, from one of the original developers of the MP3 technology.) The 1kHz tone is now represented by the dark red vertical line at the left of the graph. Note that it has acquired "skirts" below –80dB. These result, I believe, from the splitting of the continuous data representing the tone into the time chunks mentioned above, which in return results in a very slight uncertainty about the exact frequency of the tone. Note also that the random background noise has disappeared entirely. This is because the encoder is basically deaf to frequency regions that don't contain musical information. With its very limited "bit budget," the codec concentrates its resources on regions where there is audio information. However, a picket fence of very-low-level vertical lines can be seen. These represent spurious tones that result, I suspect, from mathematical limitations in the codec. Like the skirts that flank the 1kHz tone, these will not be audible. But they do reveal that the codec is working hard even with this most simple of signals.

Fig.2 Spectrum of 1kHz sinewave at –10dBFS, MP3 encoding at 128kbps (linear frequency scale, 10dB/vertical div.).

But what about when the codec is dealing not with a simple tone, but with music? One of the signals I put on Test CD 3 (track 25) simulates a musical signal by combining 43 discrete tones with frequencies spaced 500Hz apart. The lowest has a frequency of 350Hz, the highest 21.35kHz. This track sounds like a swarm of bees, but more important for a test signal, it readily reveals shortcomings in codecs, as spuriae appear in the spectral gaps between the tones..

Footnote 1: Something I have rarely seen discussed is the fact is that because all compressed file formats, both lossless and lossy, effectively have zero data redundancy, they are much more vulnerable than uncompressed files to bit errors in transmission.

Computer Audio Reviews

MP3 vs AAC vs FLAC vs CD

ARTICLE CONTENTS

ArtIcle Contents