Is ABX finally Obsolete

Status
Not open for further replies.
Try listening to the same music selections over and over looking for differences. It's not easy by any means.

Hello Gregadd

Just the act of "looking for differences" is bound skew things. I mean if you have to look for differences and the difference is not readily apparent what makes you think what you do "hear" is real and not just some figment of your imagination?? How do you see that as a reliable way to conduct a comparison??

Rob:)
 
I get all of that, Orb (well, except for the part about "debiasing sighted evaluations"), and perhaps the audio community can get to methodologies and scientific/statistical rigor at some point. Now? They're mired in something that is the equivalent of believing the earth is flat. A whole bunch of them seem to believe they can compare two pieces of audio gear more objectively when they can see them. They seem to think that removal of a high chance of bias creates a chance of bias. Good methodology designed to remove secondary influences, achieve statistically valid results and analyze the data is hardly relevant until we can get past that point. We're still not sailing east because we'll sail right off the edge of the earth.

Tim
There is a field of study involving debiasing, it is in other disciplines and sectors to audio but still applies.
In simple terms it is understanding the bias that affects the judgement, deconstructing the mechanisms and then use a framework to counter those factors - which is a combination of respective associated mental disciplines and analytical-methodology approach to the activity.
However this can only be done when the bias models are understood and the person involved is capable of doing what is required both mentally and practically.
This is really oversimplifying it, but it is possible to breakdown biases.
What is not known as far as I can tell is whether this counters biological/neurological triggers, such as a study that investigated mOFC affecting enjoyment perception that can be triggered by product price (so bias here is that bias price=quality causes a greater trigger of mOFC that then heightens perception of enjoyment-satisfaction).

These studies are not usually free, but as I said further work even with these would be great as I mentioned earlier in this thread, and others.
Anyway this is an unknown either way until more research is done and would be nice to apply this to audio and critically to those who can construct debiasing process mentally, but what is known is that debiasing principles do work.
Cheers
Orb
 
Vincent,
yeah that is nice work on the speaker study by Sean,
but there is more research that can be done.
Such as extensive-long term preference compare, training the listener to see their bias results and then see if it is possible to train the listener to be aware of both the spatial bias and product bias, and then be able to counter it.
I agree it does not change how sight studies are flawed (after all the biases are there and not controlled by default but other variables also need to be remembered), but would be interesting if additional factors could be involved in further tests.

Cheers
Orb
 
37 pages. I can't believe this.

ABX is NOT obsolete. It does it's job perfectly. The problem is there are people out there who forget what it's job is which in plain language is to determine whether differences are small enough to be statistically significant OR insignificant IN RELATION TO a target population as represented by the subjects. That's it. It will not prove any absolutes whether negative or positive. It is a statistical protocol and as such deals only with probabilities.

Now for the pro ABXers, I wish you'd stop making it look like it is the ultimate test for anything other than what it is supposed to be used for. For the anti-ABXers, I wish you'd stop making ABX look like it's useless. ABX is neither.
 
If we talk science it probably goes like this:
Do you have an experimental design?
No: it is no science
Yes: is the design apt for what you are going to investigate?
No: invalid experimental design
Yes: is the experimental setup correct?
No: invalid results
Yes: is the analyses correct?
No: invalid results
Etc.

To conduct a well-controlled experiment is not easy.
Anybody familiar with meta-analyses knows that often reports are dropped from the analyses because of methodological flaws.
If science was easy we all are scientist.

On the other hand; does our judgement improves when our perception is influenced by all kind of factors not relevant to where it is about: sound quality.

This pic from Sean’s blog demonstrate it nicely
BlindVsSightedMeanLoudspeakerRatings.png


In both cases the big floorstanders (G,D) are preferred compared with the two smaller ones.
However, in the unsighted test the differences are much smaller.
Shows you how easily our perception is influenced.
We simply use clues irrelevant to the task.
Unsighted testing removes the clues.

Of course we are not scientist so our experiment is not well controlled so not really valid.
But I prefer a non-controlled unsighted test over a non-controlled sighted one.
Saves you a judgement based on irrelevant clues.
What is interesting, the test was not ABX but more focused on preference and threshold related methodologies.
While ABX may not be obsolete, personally I do not think it is the most important test type and we do have other test methodologies as well for testing two similar things that are difficult from a sensory perspective to differentiate, where importantly the anchor-reference point is not an unknown factor that may or may not affect results.

Cheers
Orb
 
37 pages. I can't believe this.

ABX is NOT obsolete. It does it's job perfectly. The problem is there are people out there who forget what it's job is which in plain language is to determine whether differences are small enough to be statistically significant OR insignificant IN RELATION TO a target population as represented by the subjects. That's it. It will not prove any absolutes whether negative or positive. It is a statistical protocol and as such deals only with probabilities.

Now for the pro ABXers, I wish you'd stop making it look like it is the ultimate test for anything other than what it is supposed to be used for. For the anti-ABXers, I wish you'd stop making ABX look like it's useless. ABX is neither.

You certainly have my agreement and cooperation on that one, Jack. Are there any pro-ABXers on this board claiming it proves anything?

Tim
 
Oh yes.
 
Hello Gregadd

Just the act of "looking for differences" is bound skew things. I mean if you have to look for differences and the difference is not readily apparent what makes you think what you do "hear" is real and not just some figment of your imagination?? How do you see that as a reliable way to conduct a comparison??

Rob:)


That makes sense. I use to make that argument in court. We have to start somewhere. If we have a system that is not perfect and someone sends us a piece of equipment with the intent to make it better we have to see if it does. Our initial impression is not always correct. That's why we need repetition. Sometimes improvements are huge sometime incremental. I agree that I am not all that interested in small improvements. A professional takes a different attitude. He reports the improvements. You do with it what you will.
 
37 pages. I can't believe this.

ABX is NOT obsolete. It does it's job perfectly. The problem is there are people out there who forget what it's job is which in plain language is to determine whether differences are small enough to be statistically significant OR insignificant IN RELATION TO a target population as represented by the subjects. That's it. It will not prove any absolutes whether negative or positive. It is a statistical protocol and as such deals only with probabilities

Now for the pro ABXers, I wish you'd stop making it look like it is the ultimate test for anything other than what it is supposed to be used for. For the anti-ABXers, I wish you'd stop making ABX look like it's useless. ABX is neither.

Duct tape is prfect. That is why everybody uses it.:)
 
37 pages. I can't believe this.

ABX is NOT obsolete. It does it's job perfectly. The problem is there are people out there who forget what it's job is which in plain language is to determine whether differences are small enough to be statistically significant OR insignificant IN RELATION TO a target population as represented by the subjects. That's it. It will not prove any absolutes whether negative or positive. It is a statistical protocol and as such deals only with probabilities.

Now for the pro ABXers, I wish you'd stop making it look like it is the ultimate test for anything other than what it is supposed to be used for. For the anti-ABXers, I wish you'd stop making ABX look like it's useless. ABX is neither.

The problem is the above statement may be oversimplifying other variables that I keep on mentioning; anchor related type bias, reference point perception for comparison, flipping of perceived constants that are not actually a constant, and critically participant related independent observation decision strategy/difference decision strategy.
If the participant's decision strategy and cognitive behaviour is not fully understood or applied correctly to ABX, then we have unknowns that make it very difficult to state it in a way as is done there Jack.
It would be agreeable IF we had correlation between same-different methodology that is specifically designed for sensory comparing two very similer items and is an international approved standard, and ABX in the same test with same participants.
However I have never seen any study that has done this, and no-one so far has linked one when I asked if someone has this information to hand.

This raises what is the purpose of the same-different methodology that is a standard and uses AB without x?

Cheers
Orb
 
Last edited:
On to page 37...There's nothing to prove, Greg. Are you evaluating sound? If so, and all other things are equal, nothing is added by sight but the opportunity for bias (and in context, blind can mean something as simple as not knowing if it's the 320kbps file or the lossless one playing, so don't give me a bunch of nonsense about blindfolds and clinical labs and stress...). Nothing. Some sighted tests may get the same results. Some may occasionally even get more accurate results. But they can't possibly be more objective. If you can come up with some kind of sighted evaluation methodology that would would reduce the probability of our being influenced by what we see more than not actually seeing it, we'll have a conversation. Until then, this position is too dopey for reason, much less "proof."

Tim

Predictable response. I'll pass on the reciprocal name calling. Again I'll say it DBT is the scientific standard. It's ABX we are talking about. Vincent has already rebutted your point about sloppy tests.

The problems is hardly anybody uses it including its supporters like you. When they do, it is sloppy. When it is used it proves nothing. The results are incorrectly interpreted. It is to cumbersome and expensive to do a test.Those are they undisputed facts.
If you have some proof that any of that is untrue. Please give me a citation or a url.

I think it would be good if we could come up with a test that had a practical application.
 
PS. What is the purpose of testing a hypothesis if it is not to prove it or disprove it? Intellectual masturbation?
 
Yes, unless you supply me some other ones :):):)


Vincent I'm glad someone in this thread still has a sense of humor. So if I could put together a sighted test where there was no discernible pattern of picking the larger speaker you would change your mind.
I refer you to 12 most significant speakers on this site Some big some small. All based on sighted evaluations. I have lots more if you are interested. I am so funny sometimes.
 
A few points:

1. We need to look at the holistic view of bias. Experimenter bias is one thing. The proctor bias is another. But the one least talked about is the bias on behalf of the person who created the test. The former two are dealt with in ABX or other blind tests. The latter is not. I have given the example of us having a third-party test conducted double blind to show whether consumers think 64 Kbps compressed music sounded the same as the CD. The results were that better than 90% of a larger population brought in for the test said so! Our bias was to have that founding so we were exceptionally pleased when the test agency didn't understand the science of compression and picked content that was easy to compress (and what wasn't, could be explained away in the "just 10%" of the cases that difference could be heard).

It is trivial to set up experiments that show no difference in just about anything. The trick and challenge is to find a difference. After all, that is what is conclusive, not lack of difference.

To that end, I am always suspect of any tests Arny puts together as I automatically assume for good or bad that his intentions are to find no difference. As a result, he doesn't think hard about what experiment would bring out a difference. In my recent arguments with him on jitter, it came out as he posted here, that his jitter tests were of very low frequency (60 and 4 Hz). Those jitter profiles are far less audible than high frequency ones due to masking. Since Arny started with the preconception of not wanting to believe in audiblity of jitter, he was immediately satisfied with his tests. If it were me, I would have tried all extremes and then realize that mistake.

2. There is little interest in the industry to settle arguments like this. People who have the money and resources target mass market products. There, blind testing gives us the data we need as we really could care less that 1% of the population can hear small artifacts. So if blind testing wipes out small differences due to whatever reason, it is not material to the purpose at hand.

I mean why else do people have trouble in this thread posting such tests? When I asked Arny to give me ABX tests of amplifiers showing no difference, all he could do was post the Clark experiment from decades back and his own personal anecdotal tests. Surely if there was more commercial interest there would be more tests.

Even when there are tests, if the results do not agree with one's beliefs, it is dismissed. Arny puts forth the Mayer tests of high resolution formats showing no difference was detected. He pushes as aside any testing issues there. I show him another where a few people could hear a difference. He becomes highly critical of any issues in that test and doesn't believe their conclusions.

3. The Internet. Unfortunately because of forums like this, the argument becomes about who is right and doesn't lose face, rather than what is the real data. I love the discussion we had with Gary with cables where he measured the characteristics and we simulated it. At the end when the simulations showed little to no difference, Gary was cool accepted that data. But try that with anyone else. Saving face is more important than learning anything. Accepting being wrong just doesn't allow that.

As males, we are all subject to this unfortunately. So ultimately these thread and arguments are about who can look better and capture that virtual currency of being put on a pedestal. Yes, all of this applies to me just the same :).

This is sad for me though. When we have a debate I like us to at least explore the science. We don't take back much from 100 pages of arguments. But if we learn something, even if the debate doesn't conclude anything or change someone's mind we have at least accomplished something. My wish for WBF is that we get to do that as we have in the past.
 
This is going to require one of those layered things...

Predictable response.

Of course. We should know each other's positions by now.

I'll pass on the reciprocal name calling.

I called you Greg. You're welcome to reciprocate, if you like, but that's not my name.

Again I'll say it DBT is the scientific standard. It's ABX we are talking about.

For some reason you seem to want to make much ado about a rather minor distinction. Blind AB is a blind perceptual comparison. Period. DBT is a blind AB comparison in which the person conducting the test is also blind to what's being tested. Useful if the person running the test is in the room with the participants and may inadvertently influence the outcome. An ABX is a blind AB test with a control, that is useful in determining the probability that A and B can be perceptually differentiated at all. These are minor variations on blind A/B, all with a specific purpose. There is good reason to add X if you're trying to determine the probability that a difference can be perceived at all. It would be useless to add the X if the difference was obvious and you were testing for preference. If the person conducting the test is behind a one-way glass and is not talking to the participants, there is no reason for him to be blind to what is being tested; drop the D. When you add the X, or the D, you do it for a reason. None are the "scientific standard" without a reason.

The problems is hardly anybody uses it including its supporters like you. When they do, it is sloppy. When it is used it proves nothing.

Well I think the first phrase in that sentence requires some evidence of some sort. I'm not a "supporter" because I believe adding a control to a blind comparison can be useful, Greg. It's not something that requires my support. I don't use it often. Blind AB is good enough for my purposes, which are not to determine if a difference is audible but whether or not it is significant enough for me to care and, if yes, which one I like. I'm doing it for me and my own purposes in the privacy of my own home. You haven't a clue how sloppy I am or am not, and given that I'm not trying to prove anything, it doesn't matter in the slightest. Last part? Look back. I'm not trying to prove anything. Neither is anyone else with a clue. These are, when very neatly and scientifically conducted, statistical tests. They reveal probabilities, at best. And that has been said so many times by so many "supporters" of blind testing that it must be rattling your ears by now. When are you going to get it?

Tim
 
(...) ABX is NOT obsolete. It does it's job perfectly. The problem is there are people out there who forget what it's job is which in plain language is to determine whether differences are small enough to be statistically significant OR insignificant IN RELATION TO a target population as represented by the subjects. That's it. It will not prove any absolutes whether negative or positive. It is a statistical protocol and as such deals only with probabilities.
(...)

Jack,
I think you are oversimplifying. IMHO the big disagreement is whether ABX really does the job for what it was created - determine if there is or not a difference. Unless the real problem is telling what do we consider a difference?

May be we could do an exercise - what would be the alternative ways of looking for small differences if ABX was patented and the owners of the patent did not allow any use of it?
 
Last edited:
The above is 100% speculation on your part. The effects you are claiming are audible simply aren't. The issues that you are obsessiing over are simply figments of your imagination.

If you want to prove your point, first prove your subpoints!
Always fascinating that people who proclaim to be the rational, objective ones can so easily lapse instantly into the point of view that the other person is obviously totally wrong, and from then on work ferociously using science as a weapon, from their point of view, to attempt to justify their stance; rather than coolly, calmly, trying to to understand the "truth" of a situation in every aspect. Of course, the history of knowledge and science over the last few hundred years is littered with numerous examples of this, how the highly revered "experts" in various disciplines always knew the "absolute " truth ...

Frank
 
Last edited:
Well, first off I agree that setting up and running a proper dbt is hard work. I did it a few years ago. And, it was NOT ABX, just blind AB I spose you'd call it.

It took a lot of time and effort, and the Amirs point always applies, due to human nature (he calls it the make ego) it will always be interpreted by the reader according to his previously held position.

The good news is, that for a very short while at least, those who actually partook gad their eyes opened...a few beers was all it took however before it started to become that it was easy to tell them apart...that fish just grew bigger and bigger as the beers went down! (after the test I mean..it became a party)

To accomplish the test (so the listener had full control over what was what and when) we used a switch box at the LP. Well, as you can imagine, to some that was a fatal flaw yada yada.

There was a 'flaw'..or at least a weakness, that I too found when doing the test. As it involved two completely different front ends tied to one unchanging speaker, it also involved cuing up two identical cd's (as best we could time wise) and flickming between the two. That meant IF, for example, you thought there was a certain passage that might have been revealing, you could not re-listen to that passage on the other...as there were two distinct cds playing in two different players (did that make sense?)

So, if you wanted, learning was available. I learnt stuff too.

To a very large extent I too am dismayed that most forum discussion is not about learning, it is about maintaining rightness. Seriously, what is the point of that? I realise that is a personal thing, so others may not feel the same way.

To the 'deniers' (is there a better word??) why not TRY a level matched blind test?? I don't care what it is you do, forget ABX or ABC or NBC or CBC, just ensure the two most important factors, 'blind' and 'level matched'.

I assure you (and herein lies the problem I suspect) you WILL be shocked. Can you still hear differences?? YES, you can. People did hear differences in my very own test, that was not a problem for me..it might have been surprising yes, but I am always wanting to learn.

So funnily enough, THAT was one of the major thing to come out of my test, a blind test CAN and DOES still produce differences contrary to the oft produced canard. I'd say the differences were very much like that graph of seans, they were still there but by crikey the skewing provided by 'sight' (or knowledge) were gone!

Well, as the speakers were constant the differences were much less than it appears in seans graph.

So it's not a matter of 'all amps sound the same', or whatever, it is all about being brave enough to listen with the ears only. A lot are not brave enough to even try, and then they probably go on to trying to market multi kw amps or something. AND, you'd think that even for business considerations you'd at least do a bit of homework, well if you were normal that is.

Amir is completely correct on another matter. I earl;ier spoke of human nature, well sadly that applies to all on the net (I like to think I don't exhibit this as much, but I KNOW I do and must!!). Not only do we have the 'deniers' denying some BT results, we also have the 'acceptors' denying some...those that they TOO do not accept as it clashes with their pre-existing beliefs. What I mean is they will uncritically accept a result they expect or agree with, yet go to town finding faults in procedure of methodology in a contrary result.

Why is one group not subjected to the same scrutiny for properness or competence than the other IF truth is what we are after?

That, I suspect, is what lies at the bottom of a lot of Gregs position (correct me if wrong greg, not wanting to put words in your mouth) and others...well I sympathise completely. In many ways I find it far more hypocritical when that happens than when Greg (say) says 'I don't accept blind tests so I won't do them', at least he is being true to his beliefs.

But to accept any (possibly) poorly run 'blind test' simply because it is what you want and to try and find fault with those you don't want..well I see double standards at work there.

Yet greg, whilst agreeing with you in many ways, I cannot overlook the fact that you (and others) simply refuse to try a blind level matched test.

Look behind the screen, the wizard might not quite be what you expect (poor attempt at a movie reference...but as I don't watch movies it is probably very poor haha).

But equally, I fully understand that to some the magic and myth is what gives enjoyment.
 
Jack,
I think you are oversimplifying. IMHO the big disagreement is whether ABX really does the job for what it was created - determine if there is or not a difference. Unless the real problem is telling what do we consider a difference?

May be we could do an exercise - what would be the alternative ways of looking for small differences if ABX was patented and the owners of the patent did not allow any use of it?

I put a very big qualifier in there Micro ...IN RELATION TO a target population. So no, I do not think I am over simplifying. The anti-ABXers get into a huff when a pro-ABXer tries to extend the application of their results beyond that scope. An ABX, AB or DBT is only directly applicable to the actual subjects, the results more closely applicable in probability to the subset of the population that share the qualities for which they have been selected, and less applicable to everybody else. Just like any statistical test selection of respondents must be given much care lest the entire exercise be ruined. Even in a completely random survey it is not sufficient for the respondents to be chosen randomly. They still must fit parameters. For example, take a political poll for your upcoming elections next year. If the question is "Who will you vote for President?" the respondent must be of voting age and a US citizen. If the question is for Governor, he's got to be registered in that state. A follow on question about some policy in Georgia, say pertaining to something applicable by State Law, can not be applied to the rest of the US population no more than who your choice for Governor can.

The biggest fallacy of RABID pro-ABXers is the reliance of past studies which are also statistical studies as hard fact. Take that of hearing sensitivity by frequency, countless studies support 20Hz to 20kHz but if you read these studies' conclusions even these have qualifications. I have not come across any medical study that even remotely comes close to saying in any finality that their conclusions are final. The door is always left open for further discoveries because as Orb says, not every variable has been taken into account.

So let's take the results of an ABX performed in a place with a 50dB noise floor with panelists possessing X criteria. Given the same equipment put in a room of the same dimensions and construction but with HVAC more quiet so the noise floor is 40dB even with the exact same panelists. Will the output of the studies be the same? It is possible but now it is also possible that it isn't. If the proponent insists that it is then he is erroneously extending the scope of his findings because of a change of just one variable.

Now let's take it closer to home. I like using Frantz as an example, I hope he doesn't mind. He blind tested himself to see at what point he can no longer tell the difference in cables. In the end 6awg was the less expensive alternative that to him was indistinguishable from his prior more expensive cable. This test worked perfectly for him. There is no need to even contest his findings unless of course he tries to extend his conclusions and say that 6awg IS with finality sonically identical to any higher priced cable in any room, any system and to any listener and is superior to any smaller gauge. There will be people that say even 6awg is too much and 12awg is sufficient and people that will say it isn't sufficient.

I also fundamentally disagree that ABX was created to determine if there is or isn't a difference. I believe it was created knowing that there ARE differences. The purpose is to determine if the differences are statistically significant as far as a sample population is concerned.
 
Status
Not open for further replies.

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu