An Amplifier Listening Test A Comment on the Statistics
With respect to the authors (who, of course, have probably forgotten more about statistics than I ever learned), I find their statement that the proper use of chi-square test is not a matter of debate is too strict a guideline. I strongly feel that each test should be considered independent. My co-worker Will Hammond is adamant that the randomization of our tests would have successfully counteracted any tendency for dependency among the observations to develop.
Yes, if one person takes the test seven times, he or she doesn't change attitudes and preconceptions. But there was no causal relationship between any individual test and the ones that preceded and followed it.
For example, assume that I took the blind test 10 times and scored 10 correct identifications of "Same" or "Different." As I understand Professor Banks's argument, this would count as one test for determining significance, not 10, due to the fact that all my responses would be governed by the same factors inherent in my listening and decision making. But surely common sense would dictate that, assuming that no extraneous factor had crept into the test (such as a difference in level or audible noise between the two samples), that this 100% scoring would not have been due to chance but to a real audible difference, and that the number of tests would have to be considered as 10 rather than 1. (10 out of 10 is significant; 1 out of 1 is not.)
Certainly statistician Herman Burstein, in his analysis of the data (footnote 1), implicitly agreed that it was the number of tests, not the number of participants, that should be considered as the total for testing significance.
Regarding the use of the chi-square test, Will Hammond actually used a different test for significance for the bulk of the analysis.—John Atkinson
Footnote 1: See "Letters," October 1989 (Vol.12 No.10), pp.23-43.—John Atkinson