CD Tweaks & Listening Tests Page 2

Having got everything arranged to our satisfaction, the very first session on the Friday afternoon of the show went awry. The Esoteric transport uses an internal clamping system that completely covers the label surface of the CD; this turned out not to be compatible with the retaining rings for The Mod Squad Damper that had been fixed to two of the discs. I had prised these rings off to continue with the tests, only to find that a small amount of latex-like adhesive had been left on the CD's surface. When the disc was inserted into the Esoteric transport, this adhesive stuck to the clamping plate, with the result that from then on CDs stayed in the machine, the loading drawer emerging empty. The last five music selections in the first session were performed with both tweaked and non-tweaked CDs being played in the Philips transport, therefore. I removed the top plate of the Esoteric for the rest of the sessions and popped the CD free with a pencil eraser if it became stuck, allowing us to continue with the original plan. As Will explains later, however, so that we would have a complete set of music selections with the comparison done using the same transport, the Mahler and Julianne Baird presentations for the final session were performed with just the Philips transport. In this way, we would obtain a subset of the data which might indicate the degree of audibility of the effect of the tweaks alone under blind conditions.—John Atkinson

Will Hammond Analyzes the Results
For starters, the 1990 New Yorkers were different from the 1989 Californians (there's an obviousity for you) in several ways: not as many showed up to take part in listening tests, more left halfway through the test, and five score sheets with all yeses or all nos were turned in, despite our statements in the preliminary instructions that the tests weren't going to be all one way or another (none like this in San Mateo). Those quibbles aside, the New Yorkers were also a lot like the Westerners in that a) they mostly didn't distinguish between the two items under comparison very well (fig.1), and b) discriminatory skills varied considerably between bunches of them (fig.2).



Table 2: Overall Results
Group Correct Trials Total Trials Percent Correct
Males: Different CD transports 1218 2529 48.2%
Females: Different CD transports 151 296 51.0%
Males: Same CD transports* 187 397 47.1%
Grand Totals 1556 3222** 48.3%***
* Only 1 female took part in the tests using the same transport.
** Five listeners made only 6 attempts at identification instead of 7.
*** None of these scores differ significantly from chance or 50%.

Although we gathered additional data from these tests (age, sex, and listening position), the total number of listeners was less. Four sessions had over 60 listeners, about the San Mateo average, but in the other four sessions there were only a half to two-thirds as many. Nevertheless, the totals were enough for useful data—416 males and 45 females. (No, the arithmetic in Table 2 isn't wrong—five listeners made only six instead of seven choices.)

Because of the CD-transport glitch described above by JA, the last five selections in Session I were compared using just the Philips transport. To even things out, the first two selections in Session VIII were also done that way. Thus, we had one complete round of all the selections in which the CD transport was the same and only the CDs were different. In the rest, both CDs and transports were different. But to use an old Smokey Mountain expression, "it didn't make no nevermind." Whether male or female, whether only CDs were different, or whether both CDs and transports were different, in the aggregate the correctness of the scores was not different from a 50–50 situation (see Table 2).

Obviously, some citizens could discriminate better than others. Using our previous criterion of the "keen-eared observer" (KEO) as someone who got 5 or more correct out of 7, there were 82 of them (73 male or 19.7%, 9 female or 20%), or 17.8% of the total listeners. This is about 10% less than the proportion of KEOs in last year's amplifier comparison test. Regional chauvinists could do something with that, but maybe it's just that the CD difference is less than the power-amp difference. ¿Quien sabe?

Table 3: Different CO transports, total scores by Music Selection, Males
Session Correct Trials Total Trials Percent Correct Music Selection Correct Trials Percent Correct
I 59 122 48.4% 1 194 380 51.1%
II 204 456 44.7% 2 205 380 53.9%
III 134 266 50.4% 3 145 349 41.5%
IV 289 609 47.5% 4 182 355 51.3%
V 211 469 45.0% 5 165 355 46.5%
VI 86 168 51.2% 6 151 355 42.5%
VII 141 259 54.4% 7 176 355 49.3%
VIII 94 180 52.2%
Totals 121825391218 2529

Table 4 Results by music selection, independent of session (Males only, different CD transports)
Music Selection Test Correct Trials Total Trials Percent Correct Total Score Percent Correct Significance
1. Mahler A-A 26 67 38.8% 62/194 32.0% p<0.001
B-B 36 127 28.3%
A-B 73 111 65.8% 132/168 71.0%
2. J. Baird A-A 34 61 55.7% 88/148 59.5% Not significant
B-B 54 87 62.1%
A-B 95 194 49.0% 117/232 50.4%
B-A 22 38 57.9%
3. Sibelius A-A 71 185 38.4% 109/289 37.7% p<0.005
B-B 38 104 36.5%
A-B 20 36 55.6% 36/60 60.0%
B-A 16 24 66.7%
4. Paul Simon A-A 63 140 45.0% 73/164 44.5% p<0.005
B-B 10 24 41.7%
A-B 16 38 42.1% 109/191 57.1%
B-A 93 153 60.8%
5. Bach Organ A-A - - - 33/74 44.6% Not tested
B-B 33 74 44.6%
A-B 16 37 43.2% 132/281 47.0% Not significant
B-A 116 244 47.5%
6. Chopin A-A 10 24 41.7% 95/244 38.9% p<0.005
B-B 85 220 38.6%
A-B 21 36 58.3% 56/111 50.5%
B-A 35 75 46.7%
7. Praeludium A-A 33 87 37.9% 70/162 43.2% p<0.005
B-B 37 75 49.3%
A-B 86 157 54.8% 106/193 54.9%
B-A 20 36 55.6%
A is the untreated CD in the Philips CD8B0 transport
B is the tweaked CD in the Esoteric P2 transport

Table 5: Results per session, independent of selection Males only, different CD transports*
Session A-A Correct B-B Correct A-B Correct B-A Correct Total Same Correct Total Diff. Correct Significance
I - 22/61 37/61 - 22/61 37/61 <0.01
- 36.1% 60.7% - 36.1% 60.7%
II 24/60 42/132 62/132 76/132 66/192 138/264 <0.01
40.0% 31.8% 47.0% 57.6% 34.4% 52.3%
III 21/38 30/76 16/38 67/114 51/114 83/152 P=90%
55.3% 39.5% 42.1% 58.8% 44.7% 54.6% limit
IV 59/174 83/174 57/87 90/174 142/348 147/261 <0.01
33 9% 47.7% 65 5% 51.7% 40.8% 56/3%
V 60/134 50/134 69/134 32/67 110/268 101/201 <0.05
44.8% 37.3% 51.5% 47.8% 41.0% 50.2%
VI 20/48 10/24 29/48 27/48 30/7 56/96 <0.05
41.7% 41.7% 60.4% 56.3% 41.7% 58.3%
VII40/74 36/74 16/37 49/74 76/148 65/111 ns
54.1% 48.6% 43.2% 66.2% 51.4% 58.6%
VIII13/36 20/36 41/7220/36 33/72 61/108 <0.05
Totals 237/664 293/711 327/609 361/645 530/1276 618/1254
Correct 42 0% 41.2% 53.7% 56.0% 41.6% 54.9%
A is the untreated CD in the Philips 00880 transport
8is the tweaked CD In the Esoteric P2 transport

Overall, discriminating capability didn't vary all that much from one session to another, nor were the various selections significantly different as test items. This can be seen from Tables 3, 4, and 5, which show the breakdown of the results for male listeners with the different CD transports used both by session and by music selection.

Already noted is that females did not appear to be better discriminators than males (at least on this test), although their numbers were rather smaller; perhaps the sample less than fully represented the female audiophile population, which is not a very big group anyway. The age analysis revealed a couple of things not expected: the rather wide range from 14 to 77, and the number over 60 (19, or nearly 5%; fig.3). The plot of raw scores vs age (fig.4) shows, as might be expected, that there were fewer KEOs among the over-50 group (upper-right corner of plot). But look at the lower-right corner of the plot—there were also fewer less-than-median observers there, too. Is presbycusis compensated for by experience? ¿Una mas, quien sabe?



The listener-position effect was, to me anyway, unexpected. Listeners' score sheets were divided according to where the listener indicated he or she sat with respect to the speakers, as shown in fig.5. In dividing the listening space this way, we assume that a) listeners in Area I would generally be (more or less) equidistant from the speakers and close enough to avoid late reflections; b) listeners in Area II would be exposed to speaker imbalance and wall reflections; while c) listeners in Area III might be too far back to get clean direct-sound nuances and would be exposed to more miscellaneous distractions (doors opening/closing, chitchat, etc.) than the others.


Table 6: Effect of Listener Position
Score out of 7 Percentage Area 1 Percentage Area II Percentage Area III
0, 1, or 2 19.4% 26.2% 24.0%
3 or 4 61.1% 61.0% 54.2%
5, 6, or 7 19.4% 12.8% 21.9%

While we expected that Area II would do least well, what was not expected was that the scores from Area III would tend to be a bit better than from Area I (as seen by the shift on the downslope of the lines in fig.5), although the numbers of KEOs in positions I and III isn't all that different, as shown in Table 6.

Although it was tempting to do so, this aspect of the data was not further analyzed because of the relative imprecision of the listeners' locations as recorded on the scoresheets, the different numbers in some areas when groups were smaller, etc. At present, it mostly reassures us that the back third of that size room is still okay for listening tests like this.

Now to the most interesting part, again with déjà vu. As we found in last year's power-amp comparison, difference was heard at a significantly higher incidence than was sameness. In only two sessions were the total correctly identified differences significantly greater than the total correctly identified samenesses; in these sessions (III and VIII) the differences were also correctly heard more often, but not with statistical significance. When assessed by selection, the same phenomenon was noted for five of the seven selections (Table 4). The smallish number of tests for sameness in selection five (and no tests of the A-A variety) possibly helped keep that one from significance, while selection two was the other way around, though not significantly so. Why? Good question. Wish I had a good answer.

Table 7: Analysis of results when CD transports Different or Same
Group Correct Trials Percent Correct Correct Trials Percent Correct Significance
Same CD
Transports 104/239 43 5% 83/158 52.5% *
transports 73/158 46.2% 78/138 56 5% *
Males. Diff
Transports 530/1275 41.6% 688/1254 54.9% p<0.01**
Totals 707/1672 42.3% 849/1550 54.8% p<0.01**
* Probability at 90% confidence limits.
** Highly significant.—WH

What to conclude from it all? Several things, among which, to me, the most interesting is the repeated observation that difference is more reliably discerned than is sameness under the conditions of these listening tests (see Table 7). Given that this is so, why? At present, the explanation suggested by John Koval (footnote 4) seems most plausible; ie, listeners are more confident of differences and thus tend, when doubtful, to score in favor of differences because they are less confident of their ability to discern sameness and don't want to miss what they suspect is a subtle difference. Whatever the reason, as a seemingly basic principle, this tendency will, in the future, have to be taken into account when designing listening tests.

Will you hear better sound from your CDs if you treat them with Armor All, or color the edges green? Well, Armor All seems to be out now (see Sam Tellig's writings in the June and September 1990 issues), but the basic answer to the question seems to be a conditional "yes." That is, if you have sufficiently honed listening skills and can thus detect fairly subtle differences, you will probably hear a slight improvement (or at least a difference—improvement is, after all, a matter of taste). If you can't, you won't.

This, of course, leads into a whole other area: that of singly evaluating minor changes in one aspect of the whole replay chain. This is neither the time nor the place for a full discussion; I'll touch on just two facets of the complexity and how these differ. Whether a small change (such as tested here) produces an audible sonic difference depends entirely on the extent to which the auditor can discern sonic differences of this magnitude. Put the other way around, sonic differences may well exist, but if they are below the detectability threshold of the KEO, they might as well not exist. We can't be dogmatic about the absence of the difference; all that can be said is that it can't be heard. The foregoing concerns observability and is, in computer jargon, a go/no-go matter. Whether or not a difference, once heard, is of importance to an individual listener enters the area of value judgments, and must be left there.

Statistical assessment was done by both exact probability tables and by chi square, for those who are interested.

Final note: at every session, we asked that somebody come back and take the test again, so we could have some data on reproducibility. Three listeners did so, and all three had the same score both times. Thanks to all who participated, and especially to those three.—Will Hammond

Overall Conclusion
As with our 1989 amplifier tests, the outcome of all this work is a somewhat ambiguous set of results.

Some, of course, will say that the results show that the overall scoring was no different from chance, meaning that no differences can be heard between the maximally tweaked CDs and the vanilla versions—ever. My interpretation is that under the conditions of the test, some listeners could and did hear a difference.

Take my own results. Although I was inserting the discs into the two transports for each presentation, I was not aware which one was being used to drive the Proceed's DACs, the only exceptions being Session I, when I gummed up the Esoteric drive with adhesive, and Session III, when Will Hammond was unavailable to help during the first three or so presentations. I therefore regarded it as legitimate to make my own identifications of Same or Different, and did do so during one of the final sessions. I had spent sufficient time auditioning the tweaked and non-tweaked CDs during the previous sessions and during setup to be confident of identifying Same or Different under blind conditions, even if when presented with just one or the other, I couldn't be sure of making an absolute identification.

The differences were undoubtedly small under these conditions, which is, of course, not the same as saying they don't exist. To these ears, the tweaked CD in the Esoteric transport connected with the MAS interconnect had slightly more of a sense of ease to its overall sound, with a less-fatiguing treble. The bulk of this difference I would ascribe to the Esoteric transport, which Robert Harley measured as having a very low level of data jitter in its output.

So, how did I score? I was correct six times out of seven. I am sure that a statistician would not accept this score as being evidence of any audible difference, but I was quite pleased with myself, nevertheless. It's nice to have evidence that you are a KEO.

To conclude, I'd like to comment on the whole business of objectivity vs subjectivity as raised in the New York listening tests, and further discussed in this month's "Letters" column, by quoting the MIT computer scientist Joseph Weizenbaum (footnote 5) on the distrust expressed by the scientific community, indeed the distrust shown by Western culture in general, of the subjective experience:

"This newly created reality was and remains an impoverished version of the older one, for it rests on a rejection of those direct experiences that formed the basis for, and indeed constituted, the old reality. The feeling of hunger was rejected as a stimulus for eating; instead, one ate when an abstract model had achieved a certain state ie, when the hands of a clock pointed to certain marks on the clock's face...This rejection of direct experience was to become one of the principal characteristics of modern science. It was imprinted on Western European culture not only by the clock but also by the many prosthetic sensing instruments, especially those that reported on the phenomena they were set to monitor by means of pointers whose positions were ultimately translated into numbers...experiences of reality had to be representable as numbers in order to appear legitimate in the eyes of the common wisdom."

Those who place their belief in measurements alone should remember that it is still the experience itself that matters, not the description of the experience, no matter how thorough or well-researched that description.—John Atkinson

Footnote 4: In a letter to the editor, published in Stereophile, Vol.12 No.10, October 1989, pp.43–45.—John Atkinson

Footnote 5 From Computer Power and Human Reason (W.H. Freeman, 1976), as quoted in George Johnson's Machinery of the Mind (Tempus, 1986), p.267.—John Atkinson


smargo's picture

my head around articles that were printed so long ago - especially when it comes to cd players and digital - Id rather see articles that peratin to the hear and now or some record reviews that arent in the magazine

Anton's picture

Time for a 'where are they now' follow up? (I know they still make it, I meant it in terms of current utility.)

Has improved technology negated the usefulness of this tweak?

Can newer measurement techniques find out what they did?

Smargo, this is still a current product. They are 'hear and now!'

la musique's picture

Silly me when 20 or so years ago I bought the pen and did most of my CDs
They now look so ugly and to be very honest, the stuff never made any difference to the sound.
I tried black markers, different green shade, and no difference.
I do have a good cd player(Audiomeca Mephisto M2 and to be honest the big difference is in the recording of the Cd and not what snake oil you can put on the plastic.

volvic's picture

Tried it at a store, did not buy it. Made no difference to sound and now that CD looks hideous. Thankfully was very skeptical with this tweak and didn't go full on with the few CD's I had back then.

dalethorn's picture

I'd imagine that CD players are better today, with better memory buffers that can make corrections in real time. In other words, I'd expect a good CD player to be able to match a bit-perfect CD rip with full error correction. I don't know if that was possible circa 1990.

jmsent's picture

had to make corrections in real time and were perfectly capable of doing so. It was by design. Look up "interleaving" and "Reed Solomon error correction" which were integral parts of the Redbook specification. Without those, the system couldn't work. The biggest advancement was the ability for the transports to track the disc without skipping, even in the presence of large scratches. By 1990, most decent CD transports were extremely capable in this regard.

dalethorn's picture

I guess my PC-based DVD/CD players weren't, even with error correction checked. I had a copy of Bowie's Diamond Dogs that sounded partly garbled on a couple of different players. So I ripped it with error correction enabled, and while it took a while to complete, the CD rip was perfect. That experience told me that real-time error correction would not handle every case, even if it worked 98 percent of the time. That's not to say that ripping could correct every case, but I think it gets closer to 100 percent.

Robin Landseadel's picture

This is pretty much in the rear window and without re-testing with the modern, post 24/192 and SACD DACs built into common audio gear, this article has zero contextual value. I don't think the "fractal" theory holds up. Below a certain audio level, those effects are masked by anything louder, and "louder" music happens to be the most popular flavor, audiophiles be damned.

What does count is that many sonic qualities that people like and desire in analog gear are obvious distortions. Compression is a distortion, a necessary one in audio production for domestic consumption. Many flavors of analog compression are preferred over digital iterations by audiophiles and nostalgists, but not by others. And the issues of low-level resolution, addressed initially by various tweeks, are now addressed with oversampling and better jitter performance. To these old ears, modern digital playback has gone through a quantum leap in audio quality. When I used green markers on CDs, digital sound was uniformly awful. As of now, I have every good reason to prefer digital reproduction over analog reproduction, and I do.

downunderman's picture

One funny thing for me is that Cd's with a black label have on average tended to sound better than one's that are predominately silver on the label side.

All anecdotal I know and maybe I am subconsciously pinning for vinyl, but there you go.