The Misinformed Misleading the Uninformed -- A Bit About Blind Listening Tests

JackD201 · May 31, 2010

That's Myles' Clark Kent to his power lifting Superman.

Statistics IS NOT an exact science. Even the insurance companies with the best actuarial departments get whopped every so often. It works best with the largest samples and the longer periods. Double Blind tests are evaluated with basic statistical methods not with quantitative data. Double Blind tests test the subjects as well as the objects.

How many times have we seen this before?

Test subject walks in blind, meaning doesn't really know what difference he's supposed to be listening for.

Flunks test being in the coin flip range

Now he knows why he flunked and as Gavin puts it, just for grins, asks to take it again.

No longer blind, meaning this result gets thrown out along with the baby and the bathwater, he gets 75% right

Goes home feeling good since 75% was equivalent to his GPA in High School

How in heck was this double blind test useful? Here's one use for it........ this is how far we can dumb down our product for this particular market segment. Yeah! That amounts to $80 per unit in savings. Maybe we'll get a bonus.

Seriously, you don't think this happens? It is the bean counter's dream test! "Sir, I recommend we use tweeter Y over tweeter Z because at this price point these poser audiophiles statistically can't tell the difference in the showroom". Whether we are charged for to reflect the parts used is an ethical question and no longer a scientific one so that is out of the realm of double blind tests.

How can the results of a double blind test be conclusive? I'll tell you how, make the difference between the objects being tested be so large that even a slug could tell them apart. That's not what this hobby is about though. It's about small, even tiny differences that have the potential to make big contributions when taken together.

I say let the manufacturer do as many blind tests as he likes. I want a lengthy audition, preferably an in home audition.

FrantzM · May 31, 2010

Astrotoy

In what way is DBT (of which I am NOT a proponent, BTW) isn't scientific or objective? I , again , fail to understand this statement .. That it has failing is certain but to claim it as non-scientific requires some proofs.
I agree with you on the value of listener training. I do find it very important and the use of average subject or untrained subject is one of the flaws of several Double Blind Tests conducted in the Audio realm but this does not fault the methodology.
I have been an audiophile for several years and yes , as any I swore by my cables and to me they made an "enormous" difference until I replicated a test and found out that , I wasn't able to recognize my own cables reliably ... I would tend to think I am a trained listener ...Other trained listeners have also failed this simple blind test, i-e not knowing what was in the system .

I am somewhat repeating myself but I wouldn't like this point to be lost: If the debate is simply about voicing and not substantiating a position then anything goes; worse, there is no debate but if someone advance that Cable M is far superior to Cable C then that person needs to substantiate that assertion. That person should be able to recognize the contribution of Cable M to HIS system.. Alas, for cables, we would be very sorry to see how many of us would fail in this regard .... That, IMO, should give anyone pause, notice I moved toward cables one area where trained listener fail with alarming regularity when the knowledge is removed ...
Some last points .. It also bothers me to see the free pass that is given to the Bovine Manure that some manufacturers regularly spew upon us, claiming "scientific" research: The "static" charge from cables on the floor that the Cable elevators which are simply porcelain Low Voltage Electrical isolators, can remove , of course these $5 items become "Cable Lifters" and are offered for 20 or more ... or the invocation of quantum mechanics or black body radiation to "explain" what their device do... There again removing visual clues, aka blind test would help debunk these things which don't add a iota to the evolution of the Sound Reproduction... It also deters true research in the area. People who don't go with these BS claims are regularly badgered often their products shunned ... People like John Dunleavy , one of the GREAT designers of Audio gears and others less known ( Geddes?) who would not go with the nonsense..
There is much more to say about this attitude toward blind testing .. I will stop at that for now ...

Frantz

astrotoy · Jun 1, 2010

Hi Frantz,

Here is what I am saying. Just because you cannot consistently pick out one set of cables from another in a blind or DBT, doesn't mean there are not differences between the cables and that you do not prefer one to the other. All the results of the BT or DBT show is that in that test you cannot consistently pick out one set of cables or another. In fact you body may be responding positively to one set of cables, that does not translate to making a choice in a BT or DBT. For example, most people cannot tell what their blood pressue is without measuring it. However, usually with lower stress, your blood pressure goes down. A relatively simple measurement could be made by listening BT or DBT to two different sets of cables and seeing whether one set consistently gives you lower blood pressure readings (or increase in endorphins - though that would be more difficult to measure at home). If one set did consistently result in a lower BP, then I would conclude that there is a difference that your body is detecting, even if you cannot pick out a difference in cables.

I am not saying that one should not substantiate claims - to paraphrase Carl Sagan - extraordinary claims require extraordinary evidence. However, to claim that DBT or BT is the gold standard and differences do not exist if they cannot be detected by BT or DBT is just as an extraordinary claim. I would want to see some quantifiable, measureable and repeatable evidence.

I definitely agree with you that there is a lot of "bovine manure" being passed around in the audiophile world. Labelling something audiophile allows manufacturers to charge multiple times the value of something. But this is also true in the fashion world - where a designer name on an item allows pricing many times the intrinsic value or reasonable cost and markup. Take a look at the typical pricing of cables from most companies. There is usually a doubling or some large multiplier of price as one goes from entry level, up the chain of product. So the entry level is $100/pr, next is $200, then $400, then $800, then $1600, then $3200, then $6400, then $12800. I would be shocked if the cost of manufacturing each succesive line increases in that proportion.

Larry

rsbeck · Jun 1, 2010

Whenever there is a claim of improved sound, there are always three possibilities;

1) There is a real difference, but it cannot be proven.

2) There is no difference in reality, only in the imagination.

3) There is a real difference and it can be proven.

The only way to rule out possibility number two is with a double blind test.

That's just cold hard scientific fact.

If you can pass a DBT then you've proven beyond a shadow of a doubt that the difference is not imaginary, there is a real difference.

If you fail a DBT, number one still exists as a possibility.

Gregadd · Jun 1, 2010

Funny I was always told you can't prove the negative. Getting back to my steroid example what about obvious improvements, say the introduction of a subwoofer Careful rsbeck you interchanged two phrases real difference and claimed improvement.

I might add that some have passed the test, only to have test administrator attribute thier efforts to some type of trick.

FrantzM · Jun 1, 2010

Hi

let me try one last time:
Can we agree that our minds can play trick on us?
Can we agree that our biases can cloud our evaluations?
Can we agree that removing some or better all these biases (ours) lead to better evaluations?

I think we can agree on these. Why is it that when we know, we can "easily" determine and document our observations (elevated midrange, too bright transparent, lowered noise floor, etc) and when we don't know, aka blind, we can't for some components (cables).. What is the reasonable conclusion? What are the reasonable conclusions? What is the most likely ?

Frantz

JackD201 · Jun 1, 2010

Hi Frantz,

Yes. Yes. And Yes but if and only if it is the bias itself we seek to extinguish. We definitely can agree.

Why can't we reach reasonable conclusions consistently under blind test conditions?

I think it's simply because we don't listen under blind test conditions in daily life. Most of us put components through the ringer using different source materials as well as making on the go set up adjustments. We don't do snap shot listening tests and switch out stuff as quickly as we can. Like any memory, audio memory is anchored on a set of associations and not a direct recollection of the actual stimuli. Without enough time and without being in the proper frame of mind, doubt and eventually forgetfulness will set in before we even get to check the boxes.

Another big factor is being in an unfamiliar environment which includes the room, associated equipment and familiar pieces of music. These three provide vital references for association. Put even the best trained listener outside of his or her environment and he or she might as well be someone picked off of the street.

Are there bogus products out there? Definitely! Are double blind tests the only way to ferret them out? No. I certainly do not think so.

Gregadd · Jun 1, 2010

I'll accept it when it is applied across the board. Not just to embarrass reviewers and tweakists. The audio reviewer who prefers ABX/DBT rarely does it himself. A test must be devised where there is a burden of proof is on both sides. In its' present form the challenger has no burden. Let's say I get 5 out of ten right. Yes that is consistent with guessing. It's also consistent with the testee being right %50 of the time. That has to be explained. Any mathematician knows that if you flip a coin 10 times it is extremely unlikely you will actually get 5 heads and 5 tails. You've got to flip a lot of coins to get that result. The reason of course is each flip of the coin is an independent event. That is to say the previous result is completely irrelevant to whats going to happen next.
It would be significant if the person was consistently wrong. Say if he picked the wrong amp 8 times. Even though he was wrong it is probable that he at least thought he was hearing something. I used to tutor in college. I was generally happy when the person got the wrong answer because he followed the wrong formula as opposed to just guessing.

We should remember ABX is only one form of double blind testing.

rsbeck · Jun 1, 2010

Gregadd said:
Careful rsbeck you interchanged two phrases real difference and claimed improvement.

No, I posed three possibilities.

I might add that some have passed the test, only to have test administrator attribute thier efforts to some type of trick.

I doubt this has ever happened. Can you post a link or cite a reference?

rsbeck · Jun 1, 2010

Gregadd said:
Let's say I get 5 out of ten right. Yes that is consistent with guessing.

Exactly. If you understand that, there's no reason to go any further.

rsbeck · Jun 1, 2010

The question is -- what would you accept as proof that there is an actual and not imaginary difference?

The only way to prove it is with DBT.

If you want to believe there is still a difference even with a failed DBT, no problem. You just have to admit you can't prove it and you have to admit the possibility that it is imaginary.

The objectivist has to admit that there exists a possibility that the difference is real, but the test failed to prove it.

ggendel · Jun 1, 2010

rsbeck said:
Exactly. If you understand that, there's no reason to go any further.

It all comes down to the confidence level. First thing is to make sure that only one parameter in the equation is changed so we can rule out all others. Second, is to determine whether the experiment is repeatable by someone else following the procedure. Last is the number of "ears" used and the demographics.

I do trust DBT when comparing if a difference can or cannot be detected. Which one is better becomes a completely subjective matter and is a much more difficult test. Speakers are a particular aggregious case since people preferences in resonances and volume will make most results questionable. Unless the two were dramatically different (everyone can hear the difference) then you might as well just pick your personal preference for yourself, not believe anyone else.

Gregadd · Jun 1, 2010

I might add that some have passed the test, only to have test administrator attribute their efforts to some type of trick. I doubt this has ever happened. Can you post a link or cite a reference?

It was our beloved analog guru Michael Fremer when he was working for TAS. This was before the internet was available. Yes that's how long we have been debating this. My memory was bad but it between the VTL and Adcom. Yes amplifiers are the most demanding ABX/DBT test. The argument was he must he fixed on some unknown clue.

Gregadd · Jun 1, 2010

Whenever there is a claim of improved sound, there are always three possibilities;

1) There is a real difference, but it cannot be proven.

2) There is no difference in reality, only in the imagination.

3) There is a real difference and it can be proven.

The only way to rule out possibility number two is with a double blind test.

That's just cold hard scientific fact.

If you can pass a DBT then you've proven beyond a shadow of a doubt that the difference is not imaginary, there is a real difference.

If you fail a DBT, number one still exists as a possibility.

I can certainly see why you don't want to go any further. The test gives you such a tremendous advantage. The question is could you imagine a situation where a person could actually pick five correct answers and five wrong answers based on what they heard and not based on guesses. If that is possible then you have to deal with it. Other proponents of ABX/DBT have acknowledged this problem. As yet they have not given an answer.
I think you did use both terms.

I knew this would happen. I'm done. This sort of no she did not and yes he did is not going to get us anywhere. I'm going to get out while I still can.

Jay_S · Jun 1, 2010

Gregadd said:
This was before the internet was available. Yes that's how long we have been debating this.

Yes, exactly Greg. At least that long. I remember being on the Prodigy dial-up service and having the same arguments on the audio forum there. Not much has changed, and while this is an important issue, it seems like the same points have been rehashed for decades.

rsbeck · Jun 1, 2010

Gregadd said:
The argument was he must he fixed on some unknown clue.

Sounds more like an isolated off-hand and irrelevant comment to me.

rsbeck · Jun 1, 2010

Gregadd said:
I can certainly see why you don't want to go any further. The test gives you such a tremendous advantage.

The test gives no one an advantage.

The question is could you imagine a situation where a person could actually pick five correct answers and five wrong answers based on what they heard and not based on guesses.

Sure, there are probably several possibilities we could come up with.

If that is possible then you have to deal with it.

No, I don't. I just have to list other possibilities as possibilities. I have done that.
If there is a real difference, then there must be a way to prove it. The only way is to pass a DBT.
DBT is the only way to prove a difference exists. That's just a cold hard fact. If you dispute this,
kindly tell me another way.

Other proponents of ABX/DBT have acknowledged this problem. As yet they have not given an answer.

Pardon me, but I am not a "proponent" of ABX/DBT, I simply understand the methodology and the science behind it.

rsbeck · Jun 1, 2010

Gregadd said:
could you imagine a situation where a person could actually pick five correct answers and five wrong answers based on what they heard and not based on guesses. If that is possible then you have to deal with it.

A person gets five right and five wrong, here are some possibilities;

1) Consistent with random guessing.

2) Ears became fatigued -- have to see if the five right answers were all in the beginning.

3) Subject was able to ID difference correctly 50% of the time, incorrectly 50% of the time -- perhaps a poor subject for test.

None of these possibilities leads to any confidence that the difference being tested for actually exists.

The only way to prove it is to pass the test.

To prove it exists, you must eliminate all possible explanations except one.

That's what DBT is designed to do.

JackD201 · Jun 1, 2010

Just for kicks.

What if just one person out of 10,000 respondents gets it perfectly all the time?

Johnny Vinyl · Jun 1, 2010

http://www.audiocheck.net/blindtests_level.php?lvl=6

My score: 9 - 9 - 8 - 6 - 3 - 1

Now I am playing over a crappy laptop with internal speakers, but I'll bet if played it through my system the numbers wouldn't be much different. I should to play it every day for a week to see how I do.

Just for the record, I listened to each test tone ONCE and then performed the corresponding test.

John

PS~ I don't know what it proves to be honest, but I thought it was an interesting exercise.

The Misinformed Misleading the Uninformed -- A Bit About Blind Listening Tests

WBF Founding Member

Member Sponsor & WBF Founding Member

VIP/Donor

WBF Founding Member

WBF Founding Member

Member Sponsor & WBF Founding Member

WBF Founding Member

WBF Founding Member

WBF Founding Member

WBF Founding Member

WBF Founding Member

New Member

WBF Founding Member

WBF Founding Member

WBF Founding Member

WBF Founding Member

WBF Founding Member

WBF Founding Member

WBF Founding Member

Member Sponsor & WBF Founding Member

Similar threads