Do blind tests really prove small differences don't exist?

amirm · May 21, 2011

Let's have a show of hands. Who does DBTs for:

1. Coffee.

2. Barbecue sauces.

3. TVs.

4. Bottled water.

5. Coke vs Pepsi.

Just asking

.

Phelonious Ponk · May 21, 2011

amirm said:
Let's have a show of hands. Who does DBTs for:

1. Coffee.

2. Barbecue sauces.

3. TVs.

4. Bottled water.

5. Coke vs Pepsi.

Just asking .

Product developers and marketers.

Tim

arnyk · May 21, 2011

amirm said:
Arny, the rest of us can't follow your conversations that have occurred in the past and elsewhere. Either tell us what it is about the measurements that is wrong or drop the topic please.

You can probably guess what my reservations are all about - correlation with actual reliable listening evaluations.

arnyk · May 21, 2011

amirm said:
Let's have a show of hands. Who does DBTs for:

1. Coffee.

2. Barbecue sauces.

3. TVs.

4. Bottled water.

5. Coke vs Pepsi.

Just asking .

The people who develop and professionally review these kinds of products often use DBTs to evaluate them. It turns out that some of the best texts about DBTs are published by academics working with the food processing industries. I've also conversed with employees of well known food product manufacturers who spent years and decades doing DBTs of their products.

BTW, they do DBTs for perceptible differences and also for preferences. They are very concerned about their products changing their flavor while in the distribution chain and while being stored by end-users. You can bet that products like Coke Zero have been heavily DBT'd.

As far as TVs go, I'm under the impression that most of the DBTs are being done by the people who produce components such as LCD displays.

arnyk · May 21, 2011

JackD201 said:
Yes it does. Thank you. The only times I am ever bothered about DBTs is when they are made out to be the end all and be all. Way too many do this. It's important for me to see who are rational about the topic's pros and cons and those who've bought the duct tape approach to audio truth via DBT hook line and sinker.

The most extreme cases of that sort of talk that I've seen were related to people who probably never ever did any DBTs of their own.

JackD201 · May 21, 2011

arnyk said:
The most extreme cases of that sort of talk that I've seen were related to people who probably never ever did any DBTs of their own.

Yes. I call them Google Geniuses.

JackD201 · May 21, 2011

amirm said:
Let's have a show of hands. Who does DBTs for:

1. Coffee.

2. Barbecue sauces.

3. TVs.

4. Bottled water.

5. Coke vs Pepsi.

Just asking .

I don't.

I like or I don't like. When I do, I like some more than others. I leave it at that.

If I were still doing product development though oh yeah I would. Lots of it. I'm trying to please more than one person and trying to keep costs down to boot.

amirm · May 21, 2011

arnyk said:
The people who develop and professionally review these kinds of products often use DBTs to evaluate them.

Precisely why I picked those items

. So do you use DBT to make your household selection of those items?

As far as TVs go, I'm under the impression that most of the DBTs are being done by the people who produce components such as LCD displays.

I don't think any manufacturer performs DBT. However, there have a been a number of them conducted by other parties. My good friend, Robert who is an AV dealer in NY area runs one every year (or used to at least): http://www.valueelectronics.com/VE HDTV shoot out.htm

Unlike audio, we can freeze video and use instruments to calibrate sets to what they must be. Blind tests are then not necessary if the device is meets the performance criteria. When it does not, then it can be useful to some extent.

Orb · May 22, 2011

I am not so sure Amir,
some of those factors I mentioned earlier in this thread have been used/covered in research and study involving LCD displays, but I would need to try and find them as I read them in the past.
They could be academic research papers, or funded research by manufacturers or their own, unfortunately I cannot remember.

Cheers
Orb

arnyk · May 22, 2011

Amir said:
Precisely why I picked those items . So do you use DBT to make your household selection of those items?

Amir, this question suggest to me that you have missed the whole point of testing, and can't distinguish between the separate purposes of preference testing and difference testing. If that isn't true, you desperately need to repair your image.

Each item on your list is a slam dunk if one does an ABX test for differences. I have done several of them myself, and I see no reason why the rest are any harder. You don't need an ABX test to tell the difference between say LED, DLP, and plasma TC sets. Consumer Reports says that barbecue sauces taste different and that audio amplifiers sound the same within their power capabilities. Are they wrong about one and right about the other?

Some of those items aren't bought based on their differences. Good case in point is bottled water. I buy Aldi's bottled purified water because I'm already in the store, it is just fine, and its the cheapest thing around. Fool that I am I don't split hairs when I'm thirsty.

I don't think any manufacturer performs DBT.

If you are talking TVs, then read this:

http://www.gizmag.com/go/4138/

If you are talking in general..

?????????????????/

Haven't we been talking factually about which ones do which DBTs?

However, there have a been a number of them conducted by other parties.

Amir, are you having mini-strokes? Where did *that* come from?

My good friend, Robert who is an AV dealer in NY area runs one every year (or used to at least): http://www.valueelectronics.com/VE HDTV shoot out.htm

So Amir according to you, the difference between the DACs in high end AVRs is in the same range as the difference between a LCD, a Plasma, and a DLP HDTV? Are you that blind?

?????????????

Unlike audio, we can freeze video and use instruments to calibrate sets to what they must be.

The audio equivalent of the frozen frame is the carefully selected critical song snippet.

Equipment with matched FR and distortion below threshold is as well matched or better matched than any two TVs.

Blind tests are then not necessary if the device is meets the performance criteria.

That's what people like Ethan and I have been saying all along.

Phelonious Ponk · May 22, 2011

Unlike audio, we can freeze video and use instruments to calibrate sets to what they must be. Blind tests are then not necessary if the device is meets the performance criteria. When it does not, then it can be useful to some extent.

I think this might miss the point. Blind testing is about removing bias. Perhaps it's easier to measure the performance of a TV against what "must be," but that doesn't stop people from believing that their 8-year-old Pioneer Elite is better than a new, calibrated Panasonic VT-30, and seeing it that way every time they look at the logos along with the pictures. Take away their ability to tell which set is which and their preferences might change. Or they might not even be able to consistently identify which set is their Pioneer at all. That is the point of blind testing. Often the implications of the results can be a bit of a whack to the side of the head. Especially in categories in which people are deeply invested in the sophistication of their personal assessments and the superiority of their personal possessions (audio, wine, etc...). So these hobbyist communities protect their beliefs by denying the validity of blind testing. And to a point, they have one. The kind of AB/X testing that goes on over on hydrogenaudio is not statistically valid and proves nothing. But it is a much better way to compare and evaluate audio components than sighted listening, because it still gets to the basic point: it removes bias.

Tim

JackD201 · May 22, 2011

Video performance criteria is standardized.

Phelonious Ponk · May 22, 2011

JackD201 said:
Video performance criteria is standardized.

Which, of course, doesn't prevent people from thinking non-standard performance is better, any more than sub standard noise, distortion and FR response figures prevent audiophiles from believing their choices are more natural. Standardized audio performance criteria would be great, especially for recording and mastering. But much of the audiophile community would reject them for playback.

Tim

JackD201 · May 22, 2011

So? That doesn't support what Arny is saying. If there is no standard criteria for audio logically there is no cut-off point for what should and shouldn't be subjected to a blind test.

To judge the necessity on what may or may not be obvious differences is pretty subjective in my book. Obvious to who?

Phelonious Ponk · May 22, 2011

So? That doesn't support what Arny is saying. If there is no standard criteria for audio logically there is no cut-off point for what should and shouldn't be subjected to a blind test.

Did Arny say that? I missed that one. Cut off point? If you're testing for preference there is, of course, no cut-off point. If you're testing for audibility, I'd say there is no need to AB/X the audibility of something if it clearly measures within the audible range. If the audible range is within question, test the questionable range. Seems simple enough. Now, who's going to do this with what money? That is a serious question.

It's all a pretty moot point anyway. Given standards, given comprehensive testing of everything, including AB/X listening tests, those who wish to would still believe what they want to believe. That was the point of my last post. Guys like Arny, and myself, are trying to argue the validity of faith. Those who agree with us will continue to do so. Those with faith will hold to it regardless of how we test.

Tim

amirm · May 22, 2011

arnyk said:
Amir, this question suggest to me that you have missed the whole point of testing, and can't distinguish between the separate purposes of preference testing and difference testing. If that isn't true, you desperately need to repair your image.

Arny, I will give you this warning once: please do not make this discussion personal -- with me or anyone else. Stay on the technical topic please. There was no need for the last sentence above. Let your logic speak for itself.

Each item on your list is a slam dunk if one does an ABX test for differences. I have done several of them myself, and I see no reason why the rest are any harder.

Then it should be easy to tell us that in your household, you purchase nothing like that without first performing a blind test.

The answer must be an obvious "no" or you have already given it.

Fact is that we take these audio discussions far more seriously than we do other things in life. Choice of bottled water at home is not subject to winning a debate on a forum whereas audio is. So we go on with that conflict in life. Best is to be open about it and say it.

You don't need an ABX test to tell the difference between say LED, DLP, and plasma TC sets. Consumer Reports says that barbecue sauces taste different and that audio amplifiers sound the same within their power capabilities. Are they wrong about one and right about the other?

I didn't say anything about ABX. I said *DBT*. Why not test three different bottled waters blind and decide which tastes better to everyone in your family?

When I was working at Sony, we had a big fight over which coffee grind to serve to our group so we actually set up a blind test. That was a useful tool because the most opinionated was the president of the division!

Some of those items aren't bought based on their differences. Good case in point is bottled water. I buy Aldi's bottled purified water because I'm already in the store, it is just fine, and its the cheapest thing around. Fool that I am I don't split hairs when I'm thirsty.

How do you know it tastes better than your free water coming out of the faucet? Let that water sit for a few hours and then test both. Surely the faucet water is even cheaper. No?

If you are talking TVs, then read this:

http://www.gizmag.com/go/4138/

If you are talking in general..

For what reason? That is a marketing report. Not a method to design products. We do that very often in the industry. Let me tell you how it works. You contract out with a third-party to do a study like they did but include in it a confidentiality clause. One of two outcomes result:

1. The results show your product is better. You then issue a press release and make a lot of noise about it.

2. The results don't show your product to be better. In that case, you throw it out and go about your business. Or, run it again until you get the results you want

.

I can assure you that Philips does not do LCD research by hiring a third-party to run around different parts of the world to see if their products work. They would do it in-house as Harman, etc. do. Here is some useful excerpt from the article:

"The ‘masked’ comparison was done with all brand names and distinguishing design features covered up, and only the actual screens were visible to the retailers who took part in sessions conducted by Philips from March through to May. These sessions were held across Australia in venues in Sydney, Melbourne, Brisbane, Sunshine Coast, Gold Coast, Perth, Adelaide and Canberra. Brands and models for the comparison were chosen based on recommendations by key retail groups who were asked to nominate the best performers in the categories tested.

The models were displayed without any changes to their “out of the box” settings, as experienced by any consumer purchasing the product, with all models connected via component input with identical cabling.

In the research, a series of still and video clips were played simultaneously on each of the units in both standard and high definition, with participants asked to rank from first to last the screens they felt provided the optimal picture quality.

The figures, which have been independently analysed and processed by the market research company Omnicom Research, showed that the Philips 42PF9966 was chosen as the number one Plasma TV by 74% of participants, while over 60% nominated the Philips 32PF9966 as the number one LCD TV."

You think Philips would prefer to do their research in Australia instead of Eindhoven? I have been to their research lab and there was no blind testing for TVs. I have also visited a number of major LCD manufacturers in Japan and again, none ever talk about blind tests.

That said, I would not be surprised that they would do surveys to see what customers like to see in showrooms and homes. That is not the same as doing formal blind tests.

BTW, note how they used the default settings for TVs to evaluate them.

Amir, are you having mini-strokes? Where did *that* come from?

As I said, you have been warned Andy. No more personal remarks like that. Robert's blind tests are always very popular and heavily discussed on forums. If you watched the videos you would see Joel Silver being in one for example. It is puzzling that you would find an issue with it one way or the other when a guy has gone through so much trouble to set up a blind test for avid video enthusiasts to run. It is expensive and difficult thing to do.

So Amir according to you, the difference between the DACs in high end AVRs is in the same range as the difference between a LCD, a Plasma, and a DLP HDTV? Are you that blind?

This is the third warning Arny

.

Answering anyway, which is better:

1. Black levels that are higher.
2. White levels which fluctuate with video content?

The former is an LCD characteristic. The latter a Plasma. Some viewers may prefer one artifact to the other. You seem to be saying just because there is a difference, there is no need to determine preference. Yet, blind tests are run against speakers with similar characteristics above. Maybe part of the confusion comes from thinking the only test worthy is a binary ABX test?

The audio equivalent of the frozen frame is the carefully selected critical song snippet.

That is a very crude approximation. I can freeze a single temporal sample in video: the frame. As a matter of science, we cannot do that with audio. In no way is that the "equivalent." I can spend an hour staring at a video frame, looking at every pixel. I can't do that with audio. I can also do side-by-side tests of video with two displays frozen in time. I can't do that with audio as playing both at the same time doesn't allow us to examine each.

Equipment with matched FR and distortion below threshold is as well matched or better matched than any two TVs.

If our eyes were so forgiving, you would have all the same fights about video

. Remember, our video signals only have 8 bits in dynamic range or ~48db! Imagine how good your audio would sound that way. That is 20 to 30 db less than cassette tape!

That's [device meets the performance criteria] what people like Ethan and I have been saying all along.

That is not my experience with you. I showed that if jitter is below 500 picoseconds peak to peak, then we have fully preserved the 16-bit audio samples at 20 Khz. You fought me for weeks, claiming we should accept far higher values. That jitter spec is what I define as "meets the performance criteria." Our CDs have 16-bit sand 22 Khz bandwidth, so that is the jitter spec it needs to be meet to be transparent. And once there, we are free from the requirement of running a blind test. Go above that, and you get the nasty job of characterizing all jitter profiles in the world!

microstrip · May 22, 2011

Phelonious Ponk said:
(...) . If you're testing for audibility, I'd say there is no need to AB/X the audibility of something if it clearly measures within the audible range.

Tim

Tim,

Could you explain what do you mean by "measures within the audible range" . I think it is a key point in this debate.

mep · May 22, 2011

I guess if you shop at Aldi's you might enjoy DBTs and all they imply. I once went to an Aldi's store before I knew they weren't a real grocery store. They don't sell any name brand food. They sell things that look like name brand foods. Everything is a knock-off of the real items. They have maynonaise that looks just like Hellmans. They have ketchup that looks like Heinz. They have mustard that looks like French's mustard. I'm surprised that some of these companies don't get sued for trademark infringement. So, if you are a cheapskate and you don't think you can taste the difference between known quality brands and some cheap knock-off, Aldi's would be a DBT dream. I walked out of Aldi's without buying anything because I don't buy food designed to look like someone else's products and I have no idea how their product will taste or if it is even safe.

What I don't understand is why so much ink is spilled over DBTs. If you love them, fine. I don't see why people use them as a bashing tool and try to cajole, bully, and insist that everyone use them to make decisions. At the end of the day, everyone votes with their wallets and buys the gear they want to own. If a DBT would show that the new expensive gear doesn't sound better than a Bose Wave radio, who cares as long as the person who bought the expensive gear is happy with their decision? I think the true reason for DBTs has been hi-jacked by some with an agenda for making themselves look good at the expense of others. I just wish that those that have a DBT agenda would keep it to themselves like religion is best kept. And please, those with a DBT agenda, go shop at Aldi's. You deserve it.

amirm · May 22, 2011

Mark, first a request: please don't try to characterize other posters. Let's just discuss the topic at hand.

On why DBTs, in the industry at least, we don't want to develop products and create the wrong thing because of our internal bias. Harman has a great story of a guy who thought Germans wanted a different sound in a speaker than the rest of the world, just to be found wrong by blind testing.

For consumers, I agree it is being used far more to have a fight, than substantive discussion. My last few posts were aimed at recognizing that. Folks talk about DBT as if it should rule all of our purchases yet, we rarely if ever apply to other things we buy, even though it may even be simpler to do that there. Take my water example. That is a far easier test than any other audio comparison. So in that sense, yes, you are right that it is a religion more than a technical discussion.

MylesBAstor · May 22, 2011

amirm said:
On why DBTs, in the industry at least, we don't want to develop products and create the wrong thing because of our internal bias. Harman has a great story of a guy who thought Germans wanted a different sound in a speaker than the rest of the world, just to be found wrong by blind testing.

One must also be careful that this type of testing doesn't lead to products that fall out of the established preconceptions being dismissed out of hand either. Or everything sounding the same. I'm sure the MBL 101s would probably fail the Harman test yet they certainly in the right situation, do things that other speakers can only dream of.

Kind of luck formula rock where a song nowadays has to pass that hitmaker software test to be released

Do blind tests really prove small differences don't exist?

Banned

New Member

New Member

New Member

New Member

WBF Founding Member

WBF Founding Member

Banned

New Member

New Member

New Member

WBF Founding Member

New Member

WBF Founding Member

New Member

Banned

VIP/Donor

Member Sponsor & WBF Founding Member

Banned

Well-Known Member

Similar threads