Do blind tests really prove small differences don't exist?

Orb · May 25, 2011

amirm said:
Some blind tests are incredibly difficult to set up as was just mentioned. Others are easy. I fault people for not running the easy ones. But appreciate them not taking on the difficult ones .

Someone once asked me why I didn't blind test something. I said it is a simple formula for me: how much I could learn relative to how much work it is. Earlier I asked if people did double blind tests for grocery items. Clearly no one did. Reason is that they don't really care to learn something new there and it is work to do them. Accepting such a thing should not be a sign of people being against science but rather, being human. 99% of our disagreements here are not because of our differing audio views but who we are at people and what motivates us to do something versus not....

Just expanding and not disagreeing with you Amir.
One example I find similar to your grocery is that of someone promoting blind testing auditions of audio equipment and yet do not remove their sight from the equation when auditioning speakers as they feel maybe it is not as important.
By this I mean that Harman group proved not only is there bias relating to marketing-expectation of a product (that can be overcome with debiasing techniques and mechanisms that are covered in others research papers) but the actual position of the speaker in the room also skewed results (location perception will be much harder to overcome compared to the marketing-expectation type bias).
Meaning, you cannot even audition speakers correctly unless you remove your view of where it is positioned in the room.

Now how many have bothered to ensure they are not aware of where the speakers (or possibly even other gear) are positioned-location and promote blind testing of audio gear?

Thanks
Orb

amirm · May 25, 2011

Funny (again

) that you should mention Harman Orb as I just finished spending a day at Harman. The aforementioned dinner was actually with Harmon folks, Keith Yates and some of the other Harman dealers! It was an incredible day even though this was not my first visit there. I will be writing a lot of what I learned and experienced for another thread. But for now, you were very right. They measured 25% scoring difference based on positioning of the speaker. No wonder of course given the room interactions at lower frequencies. Matching speaker locations then is important in the blind test and had someone not done that, but did everything else perfectly, their results would not be valid! Knowing how to create a blind test is usually far harder than actually performing one. It is so easy to create the wrong test.

Orb · May 25, 2011

Yeah good point and a reason for their shuffler.
Just to add even if the speaker did not change position the listeners subjective value for that speaker changed between seeing its position and not.
This was noticed as they had extensively captured the listeners behaviour-heuristics while testing and comparing multiple blind and sighted position tests.
Their data showed that blind listening accounted for greater preference differences between position1 and position2 for a given speaker, where sighted actually marginalised the preference difference between the positions.
Furthermore one speaker (D) actually reversed its preference in that blind position2 gave a stronger result than position1, while sighted position1 gave a slightly higher result than position2.
http://3.bp.blogspot.com/_w5OVFV2Gs...1600-h/BlindVsSightedPositionInteractions.png

Before anyone comments, please note that I am focusing on perception-location values and yes the image does also support the more traditional points of blind testing that others have mentioned and we all agree on, but that is not the point I want to emphasise in this post.

Anyway the joy of cognitive-perception-heuristic-behaviour

Cheers
Orb

arnyk · May 25, 2011

JackD201 said:
Emphasis mine.

Blind tests are the best tests for finding out what's GOOD ENOUGH

I guess I haven't been communicating effectively. Because I do audio production, I'm aware of the potential for audio to be routed through the same processing and conversion steps many times. Therefore a DBT of a signal passing through a DAC is not appropriate. It might be good enough for an audiophile who only passes his music through a DAC once, but it isn't good enough for audio production people. A lot of modern equipment is dual-use, in that it is used by both audiophiles and production people.

For example Benchmark has been serviing professionals for decades. I used Benchmark mic preamps to record musical sounds for my old PCABX web site back in 2000 and 2001.

Being aware of this I devised a means for determining the effects of passing audio through the same piece of equipment many many times, even when there is only one of the piece of equipment to test. This procedure is non-masking in the sense that it may make the UUT sound worse than it really does, but it does not affect the sound quality of the audio signal that it is being compared to.

To put this all into perspective, we are all subjected to the effects of equipment, such as microphones and headphones, and loudspeakers, whose degrading effects are easy to hear in just one passage of the audio signal through them.

So, here is the true box score. We are able to do DBTs that will yield audible differences even when the UUT is sonically transparent when an audio signal passes through it just once. Thus, DBTs can be far more sensitive than what it takes to determine that the equipment is GOOD ENOUGH. We can do DBTs for equipment performance at the Ne Plus Ultra level.

For example, Ethan and I have been doing DBTs of this kind for several years. I don't remember exactly when I did my first one, but it was 2001 or earlier.

In addition, the ABX group started doing tests involving varying various kinds of distortion over a range that was completely under our control in the late 1970s or early 1980s. Our distortion generating methodologies can create distortion that is exceedingly similar, for all practicacl purposes identical, to that which exists in real world equipment. You want a certain amount of IM, THD or jitter? You want it to increase at say high frequencies and low frequencies? No problem!

Once we know that a certain amount of distortion is audible, it is a simple exercise in arithmetic to divide those percentages of distortion by whatever number we desire for a safety margin and set our performance goals and standars that much higher. This is of course now routine practice - much audio gear is as I said far better than merely GOOD ENOUGH. We can and have used DBTs to set our performance standards at the Ne Plus Ultra level.

To reiterate, by several different means DBTs can and have been used to stimulate the development of equipment that performs far better than merely GOOD ENOUGH .

arnyk · May 25, 2011

Old Listener said:
I don't know whether they still do blind testing now.

As I mentioned before DBTs are widely used in the food and beverage industries. One of the best papers I've ever read about DBTs was written by the Brewmaster for one of the largest breweries in the US.

Both Aldi and Trader Joes are AFAIK factoryless, farmless operations. Subcontractors do the hands-on work related to actually obtaining the raw materials, processing, and packaging. All of these steps are potentially the targets of professional taste testing and at times that naturally includes DBTs. In some cases the same facilities do work for both well-known, highly advertised brand names and house brands. In some cases the identically same product is simply packaged differently.

Having lived in Germany, I sense that a fair amount of Aldi's food has a definate European flavor to it even when it is produced in the US. This tells me that they exercise some cental control over the taste of the food they sell. Some of the products they sell are identical to regional specialties that I enjoyed when I lived in Bavaria and are labelled as being made in that area. Other than the labels, much of it seems to be identically the same. I suspect that they exploit the lower food costs in the US by sending some products made here back to Europe.

There is a well-known feature of technology where over the years technical differences tend to disappear because everybody migrates to the one or few most acceptable way(s) to do things. Information about trade secrets rarely remains secret for long. People get laid off, they change jobs for other reasons, and they simply can't and usually don't want to forget what they know.

arnyk · May 25, 2011

Amir said:
They measured 25% scoring difference based on positioning of the speaker. No wonder of course given the room interactions at lower frequencies. Matching speaker locations then is important in the blind test and had someone not done that, but did everything else perfectly, their results would not be valid! Knowing how to create a blind test is usually far harder than actually performing one. It is so easy to create the wrong test.

AFAIK I did the first ABX test of loudspeakers back in the late 70s or early 80s. The problems related to speaker positioning were made clear, talked about in the AES, and written about and circulated at that time. When I became involved with audio converenceing over the Usenet in the middle 90s, that same information was re-circulated to a far wider audience.

Most audiophiles seem to underestimate the effects of rooms on speakers. Based on tests that our audio club did with moving good speaker systems around, much if not most of the character of the sound of a given speaker depends on the room. When speakers are switched between rooms, the characteristic sound of the system often stays with the room.

Phelonious Ponk · May 25, 2011

JackD201 said:
Emphasis mine.

This is what I've been saying here for a year now. Blind tests are the best tests for finding out what's GOOD ENOUGH. Is there anything wrong with that? Hell no. It saves manufacturers and consumers valuable resources. Here's the catch though. I have to take the test. A DBT in a magazine review is just the findings on a test procedure and panel about whom I know nothing about. That to me is as useless as reading a review of someone whose biases I know nothing about.

My system continuous to involuntarily make my hair stand, dance and do fist pumps. I have a tertiary system that in all honestly I could live with. It is good enough. So yes my main system is overkill, but I love what it does to and for me. Anybody have a problem with that? Before calling me out go look in the mirror and ask yourselves if you've never gone overboard on anything because you felt you deserved to give yourself a treat.

Good enough? I guess that depends on how overkill is defined in this context and where 100 to 1000 times better takes you. What simple blind listening can do is tell you what you can hear, and if you can, what you prefer. And yes, that's good enough, unless what's making your hair stand on end is the way your system looks.

Tim

arnyk · May 25, 2011

Phelonious Ponk said:
Good enough? I guess that depends on how overkill is defined in this context and where 100 to 1000 times better takes you. What simple blind listening can do is tell you what you can hear, and if you can, what you prefer. And yes, that's good enough, unless what's making your hair stand on end is the way your system looks.

I would take a little broader view of that. If listening to your audio system provides you with a certain emotional reaction, then that reaction is potentially related to everything that went before it. For example, if you made love to someone in the same room, your reaction to being in that room with your audio system may be somewhat related to the love making, not just the music. ;-)

We see this in people's descriptions of playing vinyl. The ritual that leads up to the actual playback is often said by them to be part of the favorable experience. Can't argue with that!

Orb · May 25, 2011

arnyk said:
AFAIK I did the first ABX test of loudspeakers back in the late 70s or early 80s. The problems related to speaker positioning were made clear, talked about in the AES, and written about and circulated at that time. When I became involved with audio converenceing over the Usenet in the middle 90s, that same information was re-circulated to a far wider audience.....

Arny, are any of those tests-presentation of yours and AES presentations-discussions around this subject still available?
Very interested to see how it evolved from those days to what Floyd/Sean did afterwards from '86 onwards.
Were the AES discussions on AB/X, or more general blind testing subjective measurements (and where required correlated against objective measurements) like what I have seen with Floyd's and Sean's papers?

Thanks
Orb

JackD201 · May 25, 2011

arnyk said:
I guess I haven't been communicating effectively. Because I do audio production, I'm aware of the potential for audio to be routed through the same processing and conversion steps many times. Therefore a DBT of a signal passing through a DAC is not appropriate. It might be good enough for an audiophile who only passes his music through a DAC once, but it isn't good enough for audio production people. A lot of modern equipment is dual-use, in that it is used by both audiophiles and production people.

For example Benchmark has been serviing professionals for decades. I used Benchmark mic preamps to record musical sounds for my old PCABX web site back in 2000 and 2001.

Being aware of this I devised a means for determining the effects of passing audio through the same piece of equipment many many times, even when there is only one of the piece of equipment to test. This procedure is non-masking in the sense that it may make the UUT sound worse than it really does, but it does not affect the sound quality of the audio signal that it is being compared to.

To put this all into perspective, we are all subjected to the effects of equipment, such as microphones and headphones, and loudspeakers, whose degrading effects are easy to hear in just one passage of the audio signal through them.

So, here is the true box score. We are able to do DBTs that will yield audible differences even when the UUT is sonically transparent when an audio signal passes through it just once. Thus, DBTs can be far more sensitive than what it takes to determine that the equipment is GOOD ENOUGH. We can do DBTs for equipment performance at the Ne Plus Ultra level.

For example, Ethan and I have been doing DBTs of this kind for several years. I don't remember exactly when I did my first one, but it was 2001 or earlier.

In addition, the ABX group started doing tests involving varying various kinds of distortion over a range that was completely under our control in the late 1970s or early 1980s. Our distortion generating methodologies can create distortion that is exceedingly similar, for all practicacl purposes identical, to that which exists in real world equipment. You want a certain amount of IM, THD or jitter? You want it to increase at say high frequencies and low frequencies? No problem!

Once we know that a certain amount of distortion is audible, it is a simple exercise in arithmetic to divide those percentages of distortion by whatever number we desire for a safety margin and set our performance goals and standars that much higher. This is of course now routine practice - much audio gear is as I said far better than merely GOOD ENOUGH. We can and have used DBTs to set our performance standards at the Ne Plus Ultra level.

To reiterate, by several different means DBTs can and have been used to stimulate the development of equipment that performs far better than merely GOOD ENOUGH .

You're not the only one whose sat at behind a console Arny. You also keep on dropping Ethan's name. It's not as if Ethan hasn't done any tests that were purely measurement in nature.

I was watching TV and an infomercial came on. It was some diet pill that claimed to be scientifically proven to reduce body fat and nothing but body fat. In little letters it said the usual not FDA whatever at the same time the lady was saying "Proven in a blind test carried out in a University". What's my point? Any test is subject to abuse Arny. Would you buy those pills just because some stranger said it was blind tested? I'm no fan of the pharmaceutical industry. The FDA isn't omnipotent therefor imperfect. The question is, whose battery of tests are more stringent. There's no way of knowing based on those ads. The same way there's no way to know if your tests hold water. How do I know you aren't abusing your tests? "University", "I do Music Production". Same banana man. There are scientists that work their whole lives, finally get published, get subjected to peer review, get awards, then eventually have their theories supplanted. All this name calling is pointless. It cuts both ways. While you strive to bust myths how sure are you that you aren't just replacing one myth with another? Your numbers may not lie but what about the conclusions you drew from them? You might get 10 of 10 but have you ever gotten it at a 100% LOC Arny? These are two very different things as I assume you know. Well, have you? If you haven't since you haven't tested the entire human population, what's filling in those blanks of your very strong assertions. Faith maybe? Conviction?

JackD201 · May 25, 2011

Phelonious Ponk said:
Good enough? I guess that depends on how overkill is defined in this context and where 100 to 1000 times better takes you. What simple blind listening can do is tell you what you can hear, and if you can, what you prefer. And yes, that's good enough, unless what's making your hair stand on end is the way your system looks.

Tim

I knew we understood each other Tim. Might I add that while good music makes my hair stand on end, only gross sights do the same ;P

Yeah, what the heck is wrong with GOOD ENOUGH anyway? It is what it is. It satisfies. However, merely satisfying doesn't mean you get a second go at it at a later date.

arnyk · May 25, 2011

Orb said:
Arny, are any of those tests-presentation of yours and AES presentations-discussions around this subject still available?

This is the beginning of ABX as far as the JAES is concerned:

High-Resolution Subjective Testing Using a Double-Blind Comparator
A system for the practical implementation of double-blind audibility tests is described. The controller is a self-contained unit, designed to provide setup and operational convenience while giving the user maximum sensitivity to detect differences. Standards for response matching and other controls are suggested as well as statistical methods of evaluating data. Test results to date are summarized.

Author: Clark, David
Affiliation: ABX Company, Troy, MI
JAES Volume 30 Issue 5 pp. 330-338; May 1982

After that, the literature search is up to you. IEEE, ASA, AES journals or transactions on the consumer side Audio Magazine, Stereo Review High Fidelity, as well as The Audio Critic, The Sensible Sound and even Stereophile.

Orb · May 25, 2011

Thank you.
Cheers
Orb

Stereoeditor · May 25, 2011

amirm said:
Harman measured 25% scoring difference based on positioning of the speaker. No wonder of course given the room interactions at lower frequencies. Matching speaker locations then is important in the blind test and had someone not done that, but did everything else perfectly, their results would not be valid! Knowing how to create a blind test is usually far harder than actually performing one. It is so easy to create the wrong test.

I mentioned in an earlier message that the first blind test in which I was involved was in 1977. It was of speakers and the test designer (the late James Moir) had set up 3 speakers on a turntable so that the one to be listened to could be rotated to the same place in the room as the other 2.

For the blind speaker tests published by Stereophile in the 1990s, every speaker was placed in the same position and we ended up with just 2 listeners at a time, one in front of the other, to reduce the influence of that variable in a predictable manner. The biggest problem was the curtain in front of the speakers, which modified the room's acoustics so as to favor the speakers with an exaggerated top-octave response. Kevin Voecks, then with Snell, had provided us with a curtain of grille cloth that he had found as transparent as he knew of, but we ended up using this in a column around each speaker to minimize the effect on the room acoustics.

So yes, designing a useful blind test of speakers is not trivial.

John Atkinson
Editor, Stereophile

Phelonious Ponk · May 25, 2011

So yes, designing a useful blind test of speakers is not trivial.

I don't think anyone would argue with that, John. And if you're testing for differentiation, not preference, it's rarely necessary. Speakers have the biggest impact of anything in the chain past the recording itself. It is usually very easy to differentiate between them. Electronic components are at the opposite extreme. Assuming they are designed for transparency, not color, the difference between a $10k DAC and a $200 one, while "the difference between cold, sterile reproduction and music" to many audiophiles, will often be completely lost on the simple music lover. The gap between listeners is wide. The gap between the devices often disappears entirely under blind listening conditions. IMHO, the critical audio establishment would do the hobby a great service by beginning each electronic component review with such listening. Qualify it with "to our ears, in X system," but maintain a benchmark component, chosen for its accuracy, for each category, compare it blind to the component under review and take if from there. Can you easily differentiate the Supernova God of Audio DAC from your benchmark? Describe what you hear. Then take the blindfolds off and go about the business of reviewing the build quality, aesthetics, human interface, etc. Four or five audio pros in a room couldn't tell which one was which? Say so.

Of course you might not be left with much magazine content, but other than that, I can't see what is standing in the way of using a more objective method of listening, a more valuable way of reporting.

Tim

Stereoeditor · May 25, 2011

Phelonious Ponk said:
I don't think anyone would argue with that, John. And if you're testing for differentiation, not preference, it's rarely necessary. Speakers have the biggest impact of anything in the chain past the recording itself. It is usually very easy to differentiate between them.

Differentiate, sure. Provide reliable, repeatable, and accurate assessments? Not so easy.

IMHO, the critical audio establishment would do the hobby a great service by beginning each electronic component review with such listening. Qualify it with "to our ears, in X system," but maintain a benchmark component, chosen for its accuracy, for each category, compare it blind to the component under review and take if from there. Can you easily differentiate the Supernova God of Audio DAC from your benchmark? Describe what you hear. Then take the blindfolds off and go about the business of reviewing the build quality, aesthetics, human interface, etc. Four or five audio pros in a room couldn't tell which one was which? Say so.

Perhaps you have not read my prior posting. When the differences are small but perhaps still important, it is all too easy to organize a blind test as you describe: you get a result where the listener can only identify the DUT 5 or 6 times out of 8, which is not statistically significant; and you declare that there is no difference. But in all honesty, you have proved no such thing. You merely played a meaningless game, even if the null result reinforces what you already believe.

Of course you might not be left with much magazine content, but other than that, I can't see what is standing in the way of using a more objective method of listening, a more valuable way of reporting.

As I said in earlier postings, when the difference is small, you need to do many, many trials in order to be able to use statistical analysis. This is time- and resource-consuming and impractical for all but large organizations. Even than, as Amir has said, they choose to do this only when strategically it provides an appropriate benefit, for example whether the results justify using a $10 capacitor in a particular place in a circuit rather than a generic $1 cap.

Your dismissal above - "Of course you might not be left with much magazine content..." - gives you away: your argument is more about objecting to the existence of magazines like Stereophile because they run counter to your own belief system than with any search for truth. If you object to Stereophile and how it operates, then please don't read it.

John Atkinson
Editor, Stereophile

microstrip · May 25, 2011

Just for curiosity, I would like to imagine the following scene. The stage is the fantastic system of our administrator, Steve Williams, that is well known of WBF members. His full system, except the X2 speakers that should be used as fixed speakers, would be tested against the best system you can imagine using any units costing less than usd 1000.00 each, cheap but reliable interconnects and thick power cables of adequate gauge as speaker cables. The subs would be out of the game.

Do you really believe they will sound the same, if compared using properly carried blind tests?

I have carried something like this with my system, using Soundlab A1 Px's and Sony ES equipment versus my system, matching levels to .1dB and excluding any possibility of clipping, The differences were so great that any of my victims could easily get a near 100% score. However, as Soundlabs are a difficult load, I thing that the 95 dB 8 ohm Wilson Audio X2 would be more adequate to this imaginary test.

Phelonious Ponk · May 25, 2011

Stereoeditor said:
Differentiate, sure. Provide reliable, repeatable, and accurate assessments? Not so easy.

Perhaps you have not read my prior posting. When the differences are small but perhaps still important, it is all too easy to organize a blind test as you describe: you get a result where the listener can only identify the DUT 5 or 6 times out of 8, which is not statistically significant; and you declare that there is no difference. But in all honesty, you have proved no such thing. You merely played a meaningless game, even if the null result reinforces what you already believe.

As I said in earlier postings, when the difference is small, you need to do many, many trials in order to be able to use statistical analysis. This is time- and resource-consuming and impractical for all but large organizations. Even than, as Amir has said, they choose to do this only when strategically it provides an appropriate benefit, for example whether the results justify using a $10 capacitor in a particular place in a circuit rather than a generic $1 cap.

Your dismissal above - "Of course you might not be left with much magazine content..." - gives you away: your argument is more about objecting to the existence of magazines like Stereophile because they run counter to your own belief system than with any search for truth. If you object to Stereophile and how it operates, then please don't read it.

John Atkinson
Editor, Stereophile

I don't object to the existence of magazines like Stereophile and I don't pretend that a small, informal session of blind listening provides proof of anything. If you listened this way, and reported that, say, a group of three sophisticated audiophiles and journalists could not reliably differentiate between DAC X and DAC Y, it would not provide statistical evidence that there is no difference, but it would speak volumes regarding how significant any difference might be. My speculation that it might not leave much to say was just an observation. Once you've determined that the component in question fails to distinguish itself from the benchmark there isn't much point in going on to the kinds of subjective descriptions that are common in contemporary audio journalism. What you would do to fill the pages, I don't know. You do much better now than most. I expect you'd find something.

Tim

arnyk · May 25, 2011

microstrip said:
Just for curiosity, I would like to imagine the following scene. The stage is the fantastic system of our administrator, Steve Williams, that is well known of WBF members. His full system, except the X2 speakers that should be used as fixed speakers, would be tested against the best system you can imagine using any units costing less than usd 1000.00 each, cheap but reliable interconnects and thick power cables of adequate gauge as speaker cables. The subs would be out of the game.

When you say unites, is that inclusive of or exclusive of speakers?

Do you really believe they will sound the same, if compared using properly carried blind tests?

As long as we restrict ourselves to electronics, speaker wires and interconnects, yes.

As soon as you start chaing speakers, all bets are off.

I have carried something like this with my system, using Soundlab A1 Px's and Sony ES equipment versus my system, matching levels to .1dB and excluding any possibility of clipping, The differences were so great that any of my victims could easily get a near 100% score. However, as Soundlabs are a difficult load, I thing that the 95 dB 8 ohm Wilson Audio X2 would be more adequate to this imaginary test.

Just because they are electrostats is no excuse for them bieng so diificult. Also, they may be difficult by consumer electronics standards but should not be that difficult were professional amplifiers be used.

arnyk · May 25, 2011

Orb said:
Thank you.
Orb

I just rememberd this archive of DBTrelated stuff:

Acoustical Society of America, Hearing: Its Psychology and Physiologogy, American Institute of Physics
Andersen, Hans Christian, "The Emperor's New Clothes" Andersen's Fairy Tales, with biographical sketch of Hans Christian Andersen by Thomas W. Handford. Illustrated by True Williams and others., Chicago, Belford, Clarke (1889)
Armitage, Statistical Methods in Medicine, Wiley (1971)
Burlington, R., and May, D. Jr., Handbook of Probability and Statistics with Tables, Second Edition, McGraw Hill NY (1970)
Fisher, Ronald Aylmer, Sir, Statistical Methods and Scientific Inference, 3d ed., rev. and enl., New York Hafner Press (1973)
Frazier, Kendrik, ed., Paranormal Borderlands of Science, Prometheus Books (1981)
Grinnell, Frederick, The Scientific Attitude, Boulder, Westview Press (1987)
Hanushek, E., and Jackson, J., Statistical Methods for Social Scientists, Academic Press NY (1977)
Kockelmans, Joseph J., Phenomenology and Physical Science - An Introduction to the Philosophy of Physical Science, Duquesne Press, Pittsburg PA (1966)
Lakatos, Imre, The Methodology of Scientific Research Programmes, Vol. 1 , Cambridge University Press (1978)
McBurney, Donald H., Collings, Virginia B., Introduction to Sensation/Perception, Prentice Hall, Inc., Englewood Cliffs, NJ 07632 (1977)
Moore, Brian C. J., An Introduction to the Psychology of Hearing, 3rd Edition , Academic Press, London ; New York (1989)
Mosteller and Tukey, "Quantitative Methods", chapter in Handbook of Social Psychology, Lindzey G., and Aronson, Eds., Addison-Wesley (1964)
Neave, H. R., Statistical Tables, Allen & Unwin, London (1978)
Norman, Geoffrey, R., PDQ Statistics, B. C. Decker Toronto, C. V. Mosby St. Louis, (1986)
Rock, Irwin, An Introduction to Perception, Macmillan Publishing Company, New York NY (1975)
Scharf, Bertam, and Reynolds, George S. Experimental Sensory Psychology, Scott Forseman and Company, Glenview IL (1975)

Bailar, John C. III, Mosteller, Frederick, "Guidelines for Statistical Reporting in Articles for Medical Journals", Annals of Internal Medicine, 108:266-273, (1988).
Buchlein, R., "The Audibility of Frequency Response Irregularities" (1962), reprinted in English in Journal of the Audio Engineering Society, Vol. 29, pp. 126-131 (1981)
Burstein, Herman, "Approximation Formulas for Error Risk and Sample Size in ABX Testing", Journal of the Audio Engineering Society, Vol. 36, p. 879 (1988)
Burstein, Herman, "Transformed Binomial Confidence Limits for Listening Tests", Journal of the Audio Engineering Society, Vol. 37, p. 363 (1989)
Carlstrom, David, Greenhill, Laurence, Krueger, Arnold, "Some Amplifiers Do Sound Different", The Audio Amateur, 3/82, p. 30, 31, also reprinted in Hi-Fi News & Record Review, Link House Magazines, United Kingdom, Dec 1982, p. 37.
CBC Enterprises, "Science and Deception, Parts I-IV", Ideas, October 17, 1982, CBC Transcripts, P. O. Box 500, Station A, Toronto, Ontario, Canada M5W 1E6
Clark, D. L., Krueger, A. B., Muller, B. F., Carlstrom, D., "Lipshitz/Jung Forum", Audio Amateur, Vol. 10 No. 4, pp. 56-57 (0ct 1979)
Clark, D. L., "Is It Live Or Is It Digital? A Listening Workshop", Journal of the Audio Engineering Society, Vol.33 No.9, pp.740-1 (September 1985)
Clark, David L., "A/B/Xing DCC", Audio, APR 01 1992 v 76 n 4, p. 32
Clark, David L., "High-Resolution Subjective Testing Using a Double-Blind Comparator", Journal of the Audio Engineering Society, Vol. 30 No. 5, May 1982, pp. 330-338.
Diamond, George A., Forrester, James S., "Clinical Trials and Statistical Verdicts: Probable Grounds for Appeal", Annals of Internal Medicine, 98:385-394, (1983).
Downs, Hugh, "The High-Fidelity Trap", Modern HI-FI & Stereo Guide, Vol. 2 No. 5, pp. 66-67, Maco Publishing Co., New York (December 1972)
Frick, Robert, "Accepting the Null Hypothesis", Memory and Cognition, Journal of the Psychonomic Society, Inc., 23(1), 132-138, (1995).
Fryer, P.A. "Loudspeaker Distortions: Can We Hear Them?", Hi-Fi News and Record Review, Vol. 22, pp 51-56 (1977 June)
Gabrielsonn and Sjogren, "Preceived Sound Quality of Sound Reproducing Systems", Journal of the Acoustical Society of America, Vol 65, pp 1019-1033 (1979 April)
Gabrielsonn, "Dimension Analyses of Perceived Sound Quality of Sound Reproducing Systems", Scand. J. Psychology, Vol. 20, pp. 159-169 (1979)
Greenhill, Laurence , "Speaker Cables: Can you Hear the Difference?" Stereo Review, ( Aug 1983)
Greenhill, L. L. and Clark, D. L., "Equipment Profile", Audio, (April 1985)
Grusec, Ted, Thibault, Louis, Beaton, Richard, "Sensitive Methodolgies for the Subjective Evaluation of High Quality Audio Coding Systems", Presented at Audio Engineering Society UK DSP Conference 14-15 September 1992, available from Government of Canada Communcations Research Center, 3701 Carling Ave., Ottawa, Ontario, Canada K1Y 3Y7.
Hirsch, Julian, "Audio 101: Physical Laws and Subjective Responses", Stereo Review, April 1996
Hudspeth, A. J., and Markin, Vladislav S., "The Ear's Gears: Mechanoelectrical Transduction By Hair Cells", Physics Today, 47:22-8, Feb 1994.
ITU-R BS.1116, "Methods for the Subjective Assessment of Small Impairment in Audio Systems Including Multichannel Sound Systems", Geneva, Switzerland (1994).
Lipschitz, Stanley P., and Van der kooy, John, "The Great Debate: Subjective Evaluation", Journal of the Audio Engineering Society, Vol. 29 No. 7/8, Jul/Aug 1981, pp. 482-491.
Masters, I. G. and Clark, D. L., "Do All Amplifiers Sound the Same?", Stereo Review, pp. 78-84 (January 1987)
Masters, Ian G. and Clark, D. L., "Do All CD Players Sound the Same?", Stereo Review, pp.50-57 (January 1986)
Masters, Ian G. and Clark, D. L., "The Audibility of Distortion", Stereo Review, pp.72-78 (January 1989)
Meyer, E. Brad, "The Amp-Speaker Interface (Tube vs. solid-state)", Stereo Review, pp.53-56 (June 1991)
Nousaine, Thomas, "Wired Wisdom: The Great Chicago Cable Caper", Sound and Vision, Vol. 11 No. 3 (1995)
Nousaine, Thomas, "Flying Blind: The Case Against Long Term Testing", Audio, pp. 26-30, Vol. 81 No. 3 (March 1997)
Nousaine, Thomas, "Can You Trust Your Ears?", Stereo Review, pp. 53-55, Vol. 62 No. 8 (August 1997)
Olive, Sean E., et al, "The Perception of Resonances at Low Frequencies", Journal of the Audio Engineering Society, Vol. 40, p. 1038 (Dec 1992)
Olive, Sean E., Schuck, Peter L., Ryan, James G., Sally, Sharon L., Bonneville, Marc E., "The Detection Thresholds of Resonances at Low Frequencies", Journal of the Audio Engineering Society, Vol. 45, p. 116-128, (March 1997)
Pease, Bob, "What's All This Splicing Stuff, Anyhow?", Electronic Design, (December 27, 1990) Recent Columns, http://www.national.com/rap/
Pohlmann, Ken C., "6 Top CD Players: Can You Hear the Difference?", Stereo Review, pp.76-84 (December 1988)
Pohlmann, Ken C., "The New CD Players, Can You Hear the Difference?", Stereo Review, pp.60-67 (October 1990)
Schatzoff, Martin, "Design of Experiments in Computer Performance Evaluation", IBM Journal of Research and Development, Vol. 25 No. 6, November 1981
Shanefield, Daniel, "The Great Ego Crunchers: Equalized, Double-Blind Tests", High Fidelity, March 1980, pp. 57-61
Simon, Richard, "Confidence Intervals for Reporting Results of Clinical Trials", Annals of Internal Medicine, 105:429-435, (1986).
Spiegel, D., "A Defense of Switchbox Testing", Boston Audio Society Speaker, Vol. 7 no. 9 (June 1979)
Stallings, William M., "Mind Your p's and Alphas", Educational Researcher, November 1995, pp. 19-20
Toole, Floyd E., "Listening Tests - Turning Opinion Into Fact", Journal of the Audio Engineering Society, Vol. 30, No. 6, June 1982, pp. 431-445.
Toole, Floyd E., "The Subjective Measurements of Loudspeaker Sound Quality & Listener Performance", Journal of the Audio Engineering Society, Vol. 33, pp. 2-32 (1985 Jan/Feb)
Toole, Floyd E., and Olive, Sean E., "The Detection of Reflections in Typical Rooms", Journal of the Audio Engineering Society, Vol. 39, pp. 539-553 (1989 July/Aug)
Toole, Floyd E., and Olive, Sean E., "Hearing is Believing vs. Believing is Hearing: Blind vs. Sighted Tests, and Other Interesting Things", 97th AES Convention (San Francisco, Nov. 10-13, 1994), [3893 (H-5], 20 pages.
Toole, Floyd E., and Olive, Sean E., "The Modification of Timbre By Resonances: Perception & Measurement", Journal of the Audio Engineering Society, Vol 36, pp. 122-142 (1988 March).
Warren, Richard M., "Auditory Illusions and their Relation to Mechanisms Enhancing Accuracy of Perception", Journal of the Audio Engineering Society, Vol. 31 No. 9 (1983 September).

Do blind tests really prove small differences don't exist?

New Member

Banned

New Member

New Member

New Member

New Member

New Member

New Member

New Member

WBF Founding Member

WBF Founding Member

New Member

New Member

Member

New Member

Member

VIP/Donor

New Member

New Member

New Member

Similar threads