Double Blind Testing and the threshold of necessity

amirm · Dec 27, 2010

OOps. Tim, did you mean to say that the data is for the person conducting the test only? If so, then never mind

. At first glance I thought you meant that their test would be convincing evidence for others.

Gregadd · Dec 27, 2010

I guess I have reached sort of an epiphany regarding DBT. I reserve the right to retract my position later on should it amuse me to do so. It does appear that I can free myself from the whole DBT debate. Given Ethan's proclamation that DBT is only required for small differences. At this date audiophile gear has gotten so good that I am only interested in significant improvements.

Of course someone will always say that if the difference is so great you wont have any trouble picking it in a DBT. Because I am not a reviewer, manufacturer, or dealer I don't have to prove anything.

Phelonious Ponk · Dec 28, 2010

amirm said:
OOps. Tim, did you mean to say that the data is for the person conducting the test only? If so, then never mind . At first glance I thought you meant that their test would be convincing evidence for others.

All I meant to say about data, this time, is that the article Greg linked to, offering a different perspective, contained none. It was all anecdotal. I think you and I agree on the broad point, Amir. I understand that almost all of the DBT done in audio is informal and not rigorous enough to prove much of anything. My only point is that the simplest blind test, no matter how informal, is at least an attempt to remove the bias of the listener, and stands a pretty good chance of indicating, if not proving, something meaningful. By contrast, and this is definitely opinion, the sighted listening reports of gear-obsessed audiophiles are more likely to be wrong than right. They bring with them far more expectations, far deeper biases, than do the observations of the uninitiated. They carry massive baggage bearing down on any remote possibility of objectivity.

For this reason, i'm not sure we do agree on the value of "experienced listeners," depending on the definition of experienced. I am a very experienced listener in the sense that we speak of it in audiophile circles. But if I conducted a sighted listening test, comparing a 60 watt tube amp driving a pair of passive 3-way speakers to the same speakers in an active design, with individual solid state amps chosen to drive the loads of the specific drivers in those speakers, I would come to the party with a huge bag of expectations. Without blinding me to what is playing when, it would be fair to dismiss the results before they were gathered. Blind, that experience would become an asset. If even I could not tell you which was which, blind, a statistically significant % of the time, while that may not prove anything, it would be pretty strong evidence. It would certainly be enough to make me question my own assumptions. And that's a good thing, questioning our assumptions. A thing we can learn from.

Does the general public assume that the results of a double blind listening test is proof, regardless of the methodology and statistical significance of that test? They probably would if they cared. Most of the audiophile world, I'm afraid, cares so deeply about the security of their assumptions that they have concluded that such tests were invalid, even when the methodology was sound.

Tim

Phelonious Ponk · Dec 28, 2010

Given Ethan's proclamation that DBT is only required for small differences. At this date audiophile gear has gotten so good that I am only interested in significant improvements.

I agree with your interest in significant improvements, but I see two problems with Ethan's conclusion: 1) DBT is valuable for many other things besides identifying small differences, including establishing preference. It is valuable for any testing in which you want to remove sighted bias from the results. Even if you're not an audiophile and you're looking at a cheap-looking pair of bookshelf speakers vs. elegant, obviously expensive floor-standers and a light goes on above the one that is playing, the penchant toward bias in favor of the obviously bigger, more expensive speaker is obvious. DBT removes this bias. 2) What constitutes "small differences" itself is clearly in need of testing. How many times have we seen someone on one of these boards go on at great length about the earth-shaking, night and day improvement he got in resolution, micro dynamics and sound stage from something that science, engineering, logic and our ears would tell most of us was incremental if audible at all?

"Small differences," to put it kindly, are in the eye of the beholder. Close your eyes. Trust your ears.

Tim

RUR · Dec 28, 2010

Phelonious Ponk said:
Does the general public assume that the results of a double blind listening test is proof, regardless of the methodology and statistical significance of that test? They probably would if they cared. Most of the audiophile world, I'm afraid, cares so deeply about the security of their assumptions that they have concluded that such tests were invalid, even when the methodology was sound.

The flip side of that, Tim, is that a number of self-described objectivists have drawn breathtaking conclusions from tests which are deeply-flawed or for which the test details (#subjects, #trials, test conditions, etc. etc.) are completely unknown. The number of demonstrably valid tests e.g. Harman is, regrettably, very small.

Phelonious Ponk · Dec 28, 2010

RUR said:
The flip side of that, Tim, is that a number of self-described objectivists have drawn breathtaking conclusions from tests which are deeply-flawed or for which the test details (#subjects, #trials, test conditions, etc. etc.) are completely unknown. The number of demonstrably valid tests e.g. Harman is, regrettably, very small.

That is, unfortunately, the other side. Over on hydrogenaudio you would think that anyone running a Foobar ABX plug-in is doing statistically valid research. But as wrong as they are about that, they're probably getting a bit closer to the truth with flawed blind listening than they could get with a perfect method for staring lovingly at our shining emotional and financial investments, with all of our expectations in play, and declaring it good. A matter of degrees, perhaps. But some very important degrees IMO.

Close your eyes. Trust your ears.

Tim

amirm · Dec 28, 2010

I think double-blinds are quite useful in deflating the scales we use in describing differences. Big differences should be readily audible in double-blind tests regardless of what faults we talk about regarding such tests. No amount of "stress" for example will make you think two speakers sound identical in a double-blind test. Likewise, if we feel there are such big differences in say, DACs or Amps, then we should hear them that way too. My personal experience is that differences shrink and shrink hugely in blind tests. You could argue which one is the reality

. But I suspect it does tilt toward differences being smaller than we envision sighted.

Ethan Winer · Dec 28, 2010

amirm said:
A blind test does NOT do that [bolded section mine]. If it did, these discussions would no longer exist.

Well, a blind test will show that none of the people tested, anyway, could reliably hear a difference. But you overlook one very important point. Blind tests have proven a lot, yet the myths still persist. Why? Because the believers refuse to accept results that disagree with their "experience" so they conveniently dismiss blind testing!

I don't use cable elevators .

Well that's a relief!

--Ethan

Ethan Winer · Dec 28, 2010

Phelonious Ponk said:
I see two problems with Ethan's conclusion: 1) DBT is valuable for many other things besides identifying small differences, including establishing preference.

I agree, and I should use the term blind "auditioning" rather than blind "testing" in the context of establishing preference. Auditioning speakers blind for preference is not really a test in the same sense as proving you hear a difference after demagnetizing your LP collection.

--Ethan

amirm · Dec 28, 2010

Ethan Winer said:
Well, a blind test will show that none of the people tested, anyway, could reliably hear a difference.

It wouldn't if the test is wrong.

Then there is the notion of "reliably." In DBT methodology, we throw out votes that are right one out of four times with the assumption that it could have been chance. But what if that vote was right and a difference was heard? We throw the baby out with the bathwater there.

But you overlook one very important point. Blind tests have proven a lot, yet the myths still persist. Why? Because the believers refuse to accept results that disagree with their "experience" so they conveniently dismiss blind testing!

Give me a list of those blind tests Ethan. Let's see those papers. 'cause I am not seeing much interest in the industry which publishes papers at AES on dispelling the myths about audio. Where would I find the blind tests for four top selling speaker cables? Where would I find the blind tests for 5 top selling DACs? Where would I find the blind test for the top 10 audiophile amps?

What we get is random dart thrown at the board. The same science that gives validity to DBT, also tells us that you can't take the results of one test and apply to all DUTs in all situations.

My rule of benchmarking/test is this: they all suck. The good ones simply suck less.

I have been through DBTs wthat cost $200K+ and after the fact, we found serious faults in them yet.

Besides, there are blind tests that prove the other side too. Recall the test I post recently that four people could tell the difference between SACD and DVD-A at 192 Kbps. If such tests are always valid, then we have proven something that objectivists scuff at heavily. If such differences exist in double-blind tests (to 75%+ confidence), then surely we can believe in all small differences being proven in blind test. I have also posted about Swedish blind tests of amps, clearly showing that almost all amps color the sound. Yes, there are far fewer of these than the other way around but remember, it is easier to set up a test to hear nothing than the other way around!

Phelonious Ponk · Dec 28, 2010

Ethan Winer said:
I agree, and I should use the term blind "auditioning" rather than blind "testing" in the context of establishing preference. Auditioning speakers blind for preference is not really a test in the same sense as proving you hear a difference after demagnetizing your LP collection.

--Ethan

I think if it's done properly, it's fine to call double-blind preference studies "testing." You can use good testing methods, run enough samples to exceed the margin for error, and there are plenty of preferences in audio that are "testable." The study mentioned earlier where a preference for the bigger speakers disappeared when the speakers were no longer visible is a great example. A long series of tests at Harman that Sean has talked about were definitely conducted by appropriate "testing" methodologies, and indicated that there is a tendency toward a preference for flatter frequency response among both trained and untrained listeners. I'd bet that statistically significant numbers were reached, yielding very valuable information for a company manufacturing audio equipment.

Tim

RBFC · Dec 28, 2010

It would be interesting to know some of the variables in the speaker test. How far were the listeners from the speakers? How loud was the music played (may make differentiation between speakers much easier)? What was the makeup and treatment of the room? How many listeners were present at one time?

Lee

RUR · Dec 28, 2010

Phelonious Ponk said:
That is, unfortunately, the other side. Over on hydrogenaudio you would think that anyone running a Foobar ABX plug-in is doing statistically valid research. But as wrong as they are about that, they're probably getting a bit closer to the truth with flawed blind listening than they could get with a perfect method for staring lovingly at our shining emotional and financial investments, with all of our expectations in play, and declaring it good. A matter of degrees, perhaps. But some very important degrees IMO.

There's no "matter of degrees" in a binary test result, Tim, and that's what DBT's produce: difference/no difference. If they're not conducted under appropriate conditions, they're just as apt to provide an inaccurate result as any sighted test. For the record, I'm a big believer in DBT's - I'm simply tired of folks using deeply flawed DBT's as evidence of some pet belief while their own personal biases render them blind to those flaws.

mep · Dec 28, 2010

Let's quadruple blind test them then!

RBFC · Dec 28, 2010

It's time to bring out the Helen Keller test. No more messin' around.

Lee

mep · Dec 28, 2010

Hey Lee-I bet if you blindfolded me but let me use my hands I could tell the difference between a super model and a blister bag.

RBFC · Dec 28, 2010

OK, you bring the super model and we'll run the test. I think it'll take a whole bunch of trials just to be sure......

Lee

Phelonious Ponk · Dec 28, 2010

RUR said:
There's no "matter of degrees" in a binary test result, Tim, and that's what DBT's produce: difference/no difference. If they're not conducted under appropriate conditions, they're just as apt to provide an inaccurate result as any sighted test. For the record, I'm a big believer in DBT's - I'm simply tired of folks using deeply flawed DBT's as evidence of some pet belief while their own personal biases render them blind to those flaws.

You're part right, Ken. Blind testing can be as inaccurate as sighted testing if it's done poorly enoug, though it takes a bit of effort to bias it as much as sighted testing can manage with no effort at all. But you're wrong on the fundamentals. DBX testing is not limited to difference/no difference. It has been successfully used to test preference for decades. It's really very simple in theory. It is nothing more than a technique to remove the bias of knowing what you're testing as you pass judgement upon it. It can be applied to audio equipment or soft drinks with equal success, and has been, many times. The Coke fan will look at the Coke can and say it tastes better. The audiophile will look at the big, elegantly designed speaker/amp/DAC and believe it sounds better. Remove sight from the equation and often the result changes. Do it under carefully controlled conditions enough times to exceed the margin for error and you have a statistically significant sample, what people who do research for a living call proof, or, at the very least, very compelling evidence.

Tim

mep · Dec 29, 2010

Has anyone ever done any studies on whether our eyes help us hear?

mep · Dec 29, 2010

I think the answer is yes, they have: http://www.scientificamerican.com/podcast/episode.cfm?id=DDD9F1C2-9CDB-8C68-07EEC88298E0F5CE A quote from the article:

"We think of speech as dependent on auditory perception. But this study eerily shows just how important visual input is.

From this, it's clear that our senses did not develop in isolation, but rather, they work in tandem to form an accurate perception of our world. Here we learn that the position of the lips is key in accurately hearing what someone is saying."

Double Blind Testing and the threshold of necessity

Banned

WBF Founding Member

New Member

New Member

WBF Founding Member

New Member

Banned

Banned

Banned

Banned

New Member

WBF Founding Member

WBF Founding Member

Member Sponsor & WBF Founding Member

WBF Founding Member

Member Sponsor & WBF Founding Member

WBF Founding Member

New Member

Member Sponsor & WBF Founding Member

Member Sponsor & WBF Founding Member

Similar threads