Do blind tests really prove small differences don't exist?

amirm · May 22, 2011

MylesBAstor said:
One must also be careful that this type of testing doesn't lead to products that fall out of the established preconceptions being dismissed out of hand either. Or everything sounding the same. I'm sure the MBL 101s would probably fail the Harman test yet they certainly in the right situation, do things that other speakers can only dream of.

There is a danger of wrong results in any experiment. No test is perfect. The fact that it had DBT or ABX or WXY in its name, is no indication of it being the "truth." This is why even the companies most believing in DBTs, still conduct measurements and sighted evaluations. It is a combination of science that leads to a design (and market realities of price, style, etc).

I think at the end of the day, we must realize that we can't get to the absolute in audio. We can only hope to approximate it in our mind.

arnyk · May 22, 2011

mep said:
I guess if you shop at Aldi's you might enjoy DBTs and all they imply.

Why?

I once went to an Aldi's store before I knew they weren't a real grocery store. They don't sell any name brand food.

It's all my fault. I've eaten way too much food that had no brand name on it at all. You know, fresh fruit and vegetables from my garden or the farm store out on the edge of town. Artisan bread. None of the microbreweries I visit have any name brand beer, either, Just their own made up names. I enjoyit anyway.

They sell things that look like name brand foods. Everything is a knock-off of the real items.

Or, food of a certain general nature just happens to look a certain way.

They have mayonnaise that looks just like Hellman's.

I've made my own mayonnaise and it looks a lot like Hellman's, too. Did that make it a knock off, or the logical consequence of mixing certain ingredients in a certain way? The recipie I used said nothing about Hellmans.

They have ketchup that looks like Heinz.

I guess. More important to me is whether it tastes good on French Fries. But of course these are generic French fries with no brand name, made from fresh potatoes and cooked in oil. Knock offs, or just good food?

They have mustard that looks like French's mustard.

I never buy it. I buy the made-in-Germany mustard that has unground mustard seeds in it. It's also the natural color that mustard is before French's turns it day-glow yellow with FDIC Yellow dye number whatever. Kind of a drab brown.

I'm surprised that some of these companies don't get sued for trademark infringement.

Probably because there was mayonnaise before there was a Hellman's and there was mustard before there was
French's'.

So, if you are a cheapskate and you don't think you can taste the difference between known quality brands and some cheap knock-off, Aldi's would be a DBT dream.

But I can taste the difference, and if the ketchup doesn't taste exactly like Heinz's, I may think to myself: "It's ketchup that doesn't taste exactly like Heinz's, but it still tastes like ketchup. Mostly, I just eat and enjoy.

I walked out of Aldi's without buying anything because I don't buy food designed to look like someone else's products and I have no idea how their product will taste or if it is even safe.

I would guess that a person who thinks that anything that tastes even slightly different from a certain branded product must be inferior or inedible to them would be a marketer's dream. But would he be a person who is capable of thinking for himself?

arnyk · May 22, 2011

microstrip said:
Tim,

Could you explain what do you mean by "measures within the audible range" . I think it is a key point in this debate.

There are known thresholds below which given sounds can't be heard. Much of the great debate is about sounds like these.

Saying that a certain sound *will* be heard is far more difficult because even if it is above the known thresholds, masking may keep it from being heard. There are other concealing influences such as the duration of the sound.

The argument that a given listening test was flawed because it failed to detect a sound that was above a certain known threshold can be very difficult because other influences may have concealed that sound.

If you want to say that a test is unduely insensitive, you first have to do a well-controlled test where the stimulus in question is reliably audible. This very reasonble requirement may draw in to question some things that JA has said lately about some of his amplfiier DBTs.

Orb · May 22, 2011

Amir,
with you mentioning visual displays is an interesting topic that got me thinking and originally I could not see a way for it to be applicable to audio, apart from the DBT as you mention that is a good point.

However, I just realised that one aspect in a way may be comparable to subtle audio in terms of perception is that of banding involved with some films-products-technology where it can be seen in the shading.
In a way this could be made to be comparable to subtle audio differences in that the watcher is not allowed to pause the clip (so like audio it is a stream) and the quality can be from greatly noticable to very subtle.
I appreciate DBT have been done involving this with specific scopes, but what would be interesting is ABX involving two products or two clips that suffer with the banding and sighted the watcher is confident they can identify differences with the moving clip and then replicate the usual audio ABX that have been discussed at length; where the watcher must identify X against A or B with the restriction the clips cannot be paused.
Of course the banding would have to be subtle on one, or both with banding but one slightly worst than the other (more ideal as this would replicate uncertainty heuristic and IMO is closer to how accurately we probably are at identifying audio in ABX selections).

Still, I appreciate this is still not a match to be truly comparable for subtle-small differences in audio perception, but thought I would post as it to me this is possibly the closest we get for blind ABX selection between audio and video, just food for thought maybe.
However I understand that the original context was around blind testing principles and process, which is a more valid point.

Thanks
Orb

JackD201 · May 22, 2011

Phelonious Ponk said:
Did Arny say that? I missed that one. Cut off point? If you're testing for preference there is, of course, no cut-off point. If you're testing for audibility, I'd say there is no need to AB/X the audibility of something if it clearly measures within the audible range. If the audible range is within question, test the questionable range. Seems simple enough. Now, who's going to do this with what money? That is a serious question.

It's all a pretty moot point anyway. Given standards, given comprehensive testing of everything, including AB/X listening tests, those who wish to would still believe what they want to believe. That was the point of my last post. Guys like Arny, and myself, are trying to argue the validity of faith. Those who agree with us will continue to do so. Those with faith will hold to it regardless of how we test.

Tim

I think we just have to first define what the emphasis of the tests really are. Is it quantitative, qualitative or the hardest of all, the qualitative assessment correlated with a quantitative difference? Take your example of something that might clearly measure within the audible range, one test would measure its existence accurately but even Arny says duration matters as much as size. Wives say that too. Har har. Sorry I couldn't resist

Given repeating cycles however chances of picking up even short duration anomalies increases. What may have been masked on the first run can often be picked up when what is being focused on is known to the test subject and rises again when the subject is told to specifically listen for it. It's no different from two guys reading a page on a book. A literature proffessor might be attuned to subtle elements like foreshadowings, a news editor more attuned to grammar and spelling on the first run even if both are equally adept at spotting both. Every subject has his or her predisposition. ABX removes the bias but not the predisposition. That can only be achieved after the "reveal". I've come to understand in my mind that this is the core of Amir's exploration. Are there OTHER biases that might not be removed by an ABX? I think so. This of course doesn't mean that given enough cycles we aren't going to get false positives. This still isn't a bad thing because if one were to look at the tabulation of the results with respect to positive vs negative the pattern is usually more important than the count stated as percentages.

Then there's the matter of faith. It cuts both ways. Every philosophy has got its holes. Anybody can get in there and poke around until the foundations are shaken but really, unless one is sadistic and masochistic at the same time, why do that?

mep · May 22, 2011

arnyk said:
I've made my own mayonnaise and it looks a lot like Hellman's, too. Did that make it a knock off, or the logical consequence of mixing certain ingredients in a certain way? The recipie I used said nothing about Hellmans.

You missed my point entirely and then ran to extremes with it that has nothing to do with what I said. It's not that the mayonnaise itself looks like the maynonnaise inside of Hellmans, it's that they copied the colors of the label and the look of the Hellmans' jar to make it appear to look like Hellmans. Your other examples all equally missed my point as well. Microbreweries aren't trying to mimic commerical beers by copying their labels and appearing to be something they are aren't. I didn't say that Frenches makes the best mustard-just that Aldi sells a brand that is trying to look just like Frenches.

Old Listener · May 22, 2011

mep said:
It's not that the mayonnaise itself looks like the maynonnaise inside of Hellmans, it's that they copied the colors of the label and the look of the Hellmans' jar to make it appear to look like Hellmans..

There is some irony in your dismissing a store based on the fact that some of its products have labels resembling a name brand's label. An open minded consumer might taste the generic product against the brand name item and use that information to decide on the generic product's value (and the store's usefulness.) If the consumer was concerned that he might be swayed by the labels, he might arrange a blind taste test.

Thanks for providing an example of the practical use of a blind test.

Bill

Ron Party · May 22, 2011

amirm said:
I think at the end of the day, we must realize that we can't get to the absolute in audio. We can only hope to approximate it in our mind.

Taken to its rhetorical extreme, this statement opens the door to absolutely every single piece of snake oil ever made. It allows for people to believe they really did hear an improvement in soundstage when they said 3 Hail Marys before connecting their speaker cables, installed a Tice clock, used an intellichip, etc., etc.

This statement of yours is dangerous and absolutely needs context. This statement is true in some situations, not true in others. That's the problem with generalizations. But if we're being intellectually honest, we don't just settle for the general. Instead we focus on the specific. Amir, I love you man, but I'm afraid this is just another example of your dismissing the scientific method.

mep · May 22, 2011

Old Listener said:
There is some irony in your dismissing a store based on the fact that some of its products have labels resembling a name brand's label. An open minded consumer might taste the generic product against the brand name item and use that information to decide on the generic product's value (and the store's usefulness.) If the consumer was concerned that he might be swayed by the labels, he might arrange a blind taste test.

Thanks for providing an example of the practical use of a blind test.

Bill

Bill-It's not some of the products, it's basically everything they sell. And yes, I dismissed this store based on the fact that I don't buy food products from unknown companies who are trying to cash in on famous label foods by mimicking their lables and containers. And like I said, Aldies would represent a DBT dream because of the products they sell. I just don't want to participate.

I also refuse to shop at WalMart, but that is for an entirely different set of reasons.

MylesBAstor · May 22, 2011

Ron Party said:
Taken to its rhetorical extreme, this statement opens the door to absolutely every single piece of snake oil ever made. It allows for people to believe they really did hear an improvement in soundstage when they said 3 Hail Marys before connecting their speaker cables, installed a Tice clock, used an intellichip, etc., etc.

This statement of yours is dangerous and absolutely needs context. This statement is true in some situations, not true in others. That's the problem with generalizations. But if we're being intellectually honest, we don't just settle for the general. Instead we focus on the specific. Amir, I love you man, but I'm afraid this is just another example of your dismissing the scientific method.

So what does it mean when one hears a Tice clock and doesn't like the sound?

Phelonious Ponk · May 22, 2011

microstrip said:
Tim,

Could you explain what do you mean by "measures within the audible range" . I think it is a key point in this debate.

Really? I didn't think the range of human hearing had been in question for decades.

Tim

Phelonious Ponk · May 22, 2011

MylesBAstor said:
So what does it mean when one hears a Tice clock and doesn't like the sound?

False negative?

Tim

DonH50 · May 22, 2011

Tice clock... An acquaintance had one ages ago and raved about the impact it had. When I went to listen, he had it set up on a table at the opposite end of the room from the stereo system. I suggested it would work much better plugged into the same outlet group as the system, so I took it and placed it on the floor near his rack instead. After listening, he agreed it sounded much better that way. I think it took him a few days to discover I had never plugged it back in...

MylesBAstor · May 22, 2011

DonH50 said:
Tice clock... An acquaintance had one ages ago and raved about the impact it had. When I went to listen, he had it set up on a table at the opposite end of the room from the stereo system. I suggested it would work much better plugged into the same outlet group as the system, so I took it and placed it on the floor near his rack instead. After listening, he agreed it sounded much better that way. I think it took him a few days to discover I had never plugged it back in...

That kind of confirms what I said

Stereoeditor · May 22, 2011

amirm said:
Arny, the rest of us can't follow your conversations that have occurred in the past and elsewhere. Either tell us what it is about the measurements that is wrong or drop the topic please.

Arny felt that my jitter tests were invalid because the Miller-Dunn J-Test signal isn't dithered. I explained that the elegance of this test signal is that because both signal components are even-integer fractions of the sample rate, there is no quantizing distortion. Thus everything that you see between the signal-bins in an FFT plot of the DUT's analog output while processing this signal stems from the DUT (provided your analyzer's ADC has a greater resolution than the noisefloor of the DUT).

John Atkinson
Editor, Stereophile

Orb · May 23, 2011

Stereoeditor said:
Arny felt that my jitter tests were invalid because the Miller-Dunn J-Test signal isn't dithered. I explained that the elegance of this test signal is that because both signal components are even-integer fractions of the sample rate, there is no quantizing distortion. Thus everything that you see between the signal-bins in an FFT plot of the DUT's analog output while processing this signal stems from the DUT (provided your analyzer's ADC has a greater resolution than the noisefloor of the DUT).

John Atkinson
Editor, Stereophile

Was this also covered in Julian Dunn's Measurement Techniques for Digital Audio?

From what I remember Julian developed a certain model-algorithms using and requiring specific signals to calculate jitter and its real world effects.
Long time ago, so I am probably very wrong on this

Cheers
Orb

amirm · May 23, 2011

Ron Party said:
Taken to its rhetorical extreme, this statement opens the door to absolutely every single piece of snake oil ever made. It allows for people to believe they really did hear an improvement in soundstage when they said 3 Hail Marys before connecting their speaker cables, installed a Tice clock, used an intellichip, etc., etc.

This statement of yours is dangerous and absolutely needs context. This statement is true in some situations, not true in others. That's the problem with generalizations.

I provided the context Ron. You took it out when you quoted me

. Let me replay it.

Myles asks me if Harman is so great at blind testing of speakers, how come MBL 101s outperform theirs. What am I supposed to tell him? I am pretty sure Harman did not blind test their speakers against the MBLs. Does this mean:

1. The Revels outperform them anyway? On what basis am I supposed to make that claim? Yes, Harman has done extensive research into speaker preferences. But how can we take this to the extreme and answer a specific question like he asked?

2. Myles is wrong and the MBLs aren't as good as Revels. How do I prove this fact?

One of the biggest limitations of our blind testing of audio is that we do so little of it relative to incredible array of equipment out there. We can say many sound the same but that still leaves so much equipment out there.

Reality is then that we rely on sighted evaluation by experts to augment our swiss cheese of knowledge here. There is no other practical way to do that.

Keep in mind that I don't advocate things I can't measure or prove with science. That is the pre-requisite in my book. So the tice clock or whatever, is not something I am about. If I can show with objective data that a difference exists, then a lot of snake oil is taken out of the equation.

Orb · May 23, 2011

Also, is it not fair to say many of these blind tests are designed to examine listener behaviour/preferences and importantly specific parameter relating to specific designs due to nearly all of these products have some kind of compromise (electronics being more of a debate area for some I accept).
So in this context Harman may had investigated different parameters to MBL due to compromises they accepted that may be with different scope priority, hence their radical design differences.
These compromises could also apply to listener behaviour/preferences/thresholds in terms of applying priorities to the scope of the design, as seen even in past papers involving Sean and Floyd.
In a way I feel the many discussions relating to AB/X test with absolute identification-matching does not reflect real world practices and developments when considering subjective perception; focus being on subjective preference-error related thresholds-behaviour-rating defined variables or paramters-etc.

Thanks
Orb

microstrip · May 23, 2011

Phelonious Ponk said:
"Could you explain what do you mean by "measures within the audible range" I think it is a key point in this debate. "

Really? I didn't think the range of human hearing had been in question for decades.

Tim

I am addressing the term "measures" not the audible range.

Your original statement was " If you're testing for audibility, I'd say there is no need to AB/X the audibility of something if it clearly measures within the audible range."

Phelonious Ponk · May 23, 2011

amirm said:
I provided the context Ron. You took it out when you quoted me . Let me replay it.

Myles asks me if Harman is so great at blind testing of speakers, how come MBL 101s outperform theirs. What am I supposed to tell him? I am pretty sure Harman did not blind test their speakers against the MBLs. Does this mean:

1. The Revels outperform them anyway? On what basis am I supposed to make that claim? Yes, Harman has done extensive research into speaker preferences. But how can we take this to the extreme and answer a specific question like he asked?

2. Myles is wrong and the MBLs aren't as good as Revels. How do I prove this fact?

One of the biggest limitations of our blind testing of audio is that we do so little of it relative to incredible array of equipment out there. We can say many sound the same but that still leaves so much equipment out there.

Reality is then that we rely on sighted evaluation by experts to augment our swiss cheese of knowledge here. There is no other practical way to do that.

Keep in mind that I don't advocate things I can't measure or prove with science. That is the pre-requisite in my book. So the tice clock or whatever, is not something I am about. If I can show with objective data that a difference exists, then a lot of snake oil is taken out of the equation.

If Myles said that MBLs outperform Revels and Sean did not say Revels outperform MBLs (he would say something much more specific, wouldn't he?), then the burden of proof is on Myles. I could say my old LaScalas outperformed Wilson Sophias (and by a couple of parameters they did). That would be nothing more than my opinion, stated as fact, and indicative of the problems of nothing else.

Tim

Do blind tests really prove small differences don't exist?

Banned

New Member

New Member

New Member

WBF Founding Member

Member Sponsor & WBF Founding Member

New Member

WBF Founding Member

Member Sponsor & WBF Founding Member

Well-Known Member

New Member

New Member

Member Sponsor & WBF Technical Expert

Well-Known Member

Member

New Member

Banned

New Member

VIP/Donor

New Member

Similar threads