Tips for ABX Tests

Gregadd · Oct 6, 2013

To be a quack you have to be practicing some profession. Just curious .What profession am I a quack in? What friend are you talking about?
Don't answer that I am not interested in trading insults. Please feel free to continue alone. It says more about you than it does me.

You feel your reputation has been injured? I don't even know who you are.
Let me afford you this opportunity on behalf of the members to tell us who you are. I do know a guy named j.j. set up some sort of AB test with a broken amp and appeared with Ethan in a video. Is that you?

Maybe you can tell us your real name, degrees if any, publications, whether produced any audio products,etc. if you have already done it somewhere maybe you could give me a link. Right now you are just a guy trying to shout me down.

Gregadd · Oct 6, 2013

amirm said:
Greg, the scientific community has firmly established what it believes in and what it doesn't. You have no prayer of changing their stand by arguing in a forum. It simply is the case that sighted results won't be trusted whatsoever. It is not JJ that is saying that. It is not a couple of people who are saying that. It is the entire community that works under the auspicious of *audio science*. The best course of action to say that you disagree and move on. You are not going to remotely get close to any reasoning that would change any person in that community. And if you are trying to convince people in your own camp, well, they don't need any convincing because they are already in your camp!

To the extent that you accurately characterize what I am trying to do here you are correct. The goal of a test is to judge sound and nothing else. Could it be that the reason we obtain so many inclusive results in ABX is because the test itself is a significant factor? For example How is it that a guy who knows as much about digital as you could not rank the copies of digital in Ethan's test? Clearly you knew what to look for. Opus 211 knew what to look for and was able to do it easily.

Phelonious Ponk · Oct 6, 2013

Gregadd said:
To the extent that you accurately characterize what I am trying to do here you are correct. The goal of a test is to judge sound and nothing else. Could it be that the reason we obtain so many inclusive results in ABX is because the test itself is a significant factor? For example How is it that a guy who knows as much about digital as you could not rank the copies of digital in Ethan's test? Clearly you knew what to look for. Opus 211 knew what to look for and was able to do it easily.

Could it be? Of course. It almost certainly is. ABX Could be a significant factor that is skewing the results inaccurately in favor of false negatives, or ABX could be a significant factor that is removing much of the opportunity for the comparison to be skewed by bias, illuminating entire categories of false positives. And it is still unsettled, because while virtually the entire scientific community, through many trials, the collection of much of statistical information, and peer review, has decided it is the latter, where as internet audiophiles, faced with results that slay their sacred cows, have decided it is the former. However shall we resolve this?

Tim

Gregadd · Oct 6, 2013

So Amir do you think maybe you could remember what you did on Ethans test and see if any of my tis might apply.

I'll locate thread and post a link.
http://www.whatsbestforum.com/showthread.php?7502-Converter-loop-back-tests

1. Do you think the test was compromised because Ethan had already taken a position and designed a test to prove it?
2. Do you think think the test examples were properly prepared and capable of revealing the differences you expected to find?
3. You expressed some doubt if Ethan correctly represented the answers, Do you still feel that way?
4. Do you regret taking the test?
6,Many members neglected to participate. Notably some diehard ABX advocates. Do you have an opinion on that?
7. Is there anything you would have changed about the test or about how you conducted the evaluation? Would you do it again ?
8. Do you feel memory was a significant factor?You did have to listen to a. then comapre it to b. and then to c. etc. Did that present any problem for you?
Please make any other comments you find appropriate.

From time to time I need to repeat the notion that I recognize the validity so called blind test scientific evidence. Iaraphrase that great Hall of fame pitcher for the Baltimore Orioles Jim Palmer. When he criticizes a pitcher he adds to it life is easy in the design but difficult in the execution. I'm pretty sure he did not mean to throw a belt high fastball over the middle of the plate.
.

amirm · Oct 6, 2013

Gregadd said:
Could it be that the reason we obtain so many inclusive results in ABX is because the test itself is a significant factor?

No, the most likely explanation is that we imagine far more things in sighted tests than there is. This explanation has a lot of proof points in it. Your alternative explanation is waiting on one

.

All tests have a degree of error. The degree of error in sighted tests is massive. I can easily make a person think two identical tracks played in identical ways sound different by simply telling them something before the test. "Now listen to this version that is going through this amazing new DAC for $10K." ABX tests depending on level of rigor can also be faulty. We could for example pick material that is not revealing. We can look into the test conditions to detect such problems. In balance, the scientific community much rather have that problem than that of sighted tests where anything goes as far as results.

For example How is it that a guy who knows as much about digital as you could not rank the copies of digital in Ethan's test? Clearly you knew what to look for. Opus 211 knew what to look for and was able to do it easily.

Not sure what this has to do with anything. Both Opus and I used mechanical means to detect the differences and that was my issue with Ethan's test. That on the Internet you can't get reliable data. I got it wrong by the way in an unexplained manner. I measured signal to noise ratio and rated them by that. Interestingly, that was not the right order. Not sure why that would be the case other than the tool making the measurement being faulty (my best guess). Did you take the test or are willing to take it now?

amirm · Oct 6, 2013

Gregadd said:
1. Do you think the test was compromised because Ethan had already taken a position and designed a test to prove it?

There is a possibility there. JJ has been asking you to read BS1116. Picking a revealing content and training are pre-requisites for proper blind test. I don't think Ethan or even I know what content would be most revealing in many loops of AD and DA.

It is also possible that his conclusion is right: that we took one distortion that audiophiles think is huge, multiplied by 10 or whatever and folks could still not count the steps. For the clips he picked, that is the likely conclusion.

2. Do you think think the test examples were properly prepared and capable of revealing the differences you expected to find?

See above. Question back at you is whether you would have accepted that 10 loops of AD and DA would have no effect on *any* sample music as he picked. If it can be so transparent in that song, do you think it will all of a sudden be revealing in all or most other????

3. You expressed some doubt if Ethan correctly represented the answers, Do you still feel that way?

See my previous post.

4. Do you regret taking the test?

Never. I never regret taking these tests even if it shows me to be deaf

. I have actually fully documented cases where despite my claim expertise and hearing acuity, I was caught rating identical tracks different in blind tests and with conviction no less! All because I was insisting that was the outcome (i.e. assumed they were different). The solution to the problem you seek -- an audiophile made into a fool by not being able to tell differences blind -- is to accept that our hearing is never as good as we think. If you want to dispute that, then run a private blind test. You will quickly realize that is the case.

6,Many members neglected to participate. Notably some diehard ABX advocates. Do you have an opinion on that?

I explained the reason above. Males are terrible at this. We don't want to be shown to be wrong in a public forum. I am not. Neither should you be scared. Go through a lot of them and then ponder what it means. If you still think it means nothing, then fine. But I am confident if you did this and properly so, it will change our outlook on audio evaluation. Lest you not have a logical mind

.

7. Is there anything you would have changed about the test or about how you conducted the evaluation? Would you do it again ?

Yes, I would remove the chances of mechanical analysis. It is hard to do but can be done. Second thing I would do is try damn hard to find one song where it does make a difference. That would show me that my test is correct. Maybe such a track does not exist but my experience in other domains shows that it usually exists. 10 generations should cause audible distortion somewhere.

8. Do you feel memory was a significant factor?You did have to listen to a. then comapre it to b. and then to c. etc. Did that present any problem for you?
Please make any other comments you find appropriate.

I was not close to any listening station to do any evaluation of that type. Was in our vacation house with a laptop. So I performed my mechanical analysis instead.

From time to time I need to repeat the notion that I recognize the validity so called blind test scientific evidence. Iaraphrase that great Hall of fame pitcher for the Baltimore Orioles Jim Palmer. When he criticizes a pitcher he adds to it life is easy in the design but difficult in the execution. I'm pretty sure he did not mean to throw a belt high fastball over the middle of the plate.
.

The question is what you do with that belief. If you throw it out the window when you evaluate the next audio product, then you don't really have this belief. I do a lot of sighted tests as that is faster and far more convenient. But if I am going to lay an opinion about those differences, I do my best to try to set up at least an informal blind test. If blind test results are very different, I remain cautious in claiming conclusions.

Gregadd · Oct 6, 2013

Thank you for taking the time for such and informative and candid.

I don't do A/B comparisons of the type that are mentioned above. Perhaps I should. I did listen to the tracls and found they were to short for me to form an opinion. Moreover when I got a fix on the cello the other music wiped it out. I just could not remember what A sounded like long enough to compare a to b.c and d.

I am not threatened by science. However I do not regard an audio review as a scientific experiment. It is a critique and should be taken as such. A scientific evaluation is another thing and of course should follow a scientific protocol.

Personally I believe the subjective and objective schools have something to offer and never the two shall meet. Maybe it's the lawyer in me but I like a healthy debate. Unfortunately it always descends into a food fight.

Once I cool down I try to read everything suggested to me.

Gregadd · Oct 6, 2013

I recall the phrase"close your eyes and imagine" we can imagine far more blind than sighted.

microstrip · Oct 6, 2013

Amir,

Just imagine the following experiments and results using A and B:

You ask 1000 people to express their preference under DBT and 950 prefer A
You ask another 1000 people and 940 also prefer A

Then you carry an ABX test using the same people and you find that they can not separate them.

What would be your possible conclusions?

Don Hills · Oct 6, 2013

microstrip said:
...

What would be your possible conclusions?

One possible conclusion is that most people, if unable to determine any difference betwen A and B but required to choose, will choose A.
The test would need to control for this, by performing a round of testing where A and B are identical.

j_j · Oct 6, 2013

Gregadd said:
To be a quack you have to be practicing some profession. Just curious .What profession am I a quack in? What friend are you talking about?
Don't answer that I am not interested in trading insults. Please feel free to continue alone. It says more about you than it does me.

You feel your reputation has been injured? I don't even know who you are.
Let me afford you this opportunity on behalf of the members to tell us who you are. I do know a guy named j.j. set up some sort of AB test with a broken amp and appeared with Ethan in a video. Is that you?

Maybe you can tell us your real name, degrees if any, publications, whether produced any audio products,etc. if you have already done it somewhere maybe you could give me a link. Right now you are just a guy trying to shout me down.

Look who's talking.

First, an uninformed person can spout quack science.

Second, who I am is easily determined, unlike yourself. Due diligence?

j_j · Oct 6, 2013

amirm said:
Second thing I would do is try damn hard to find one song where it does make a difference.

Well, that speaks to the lack of positive controls, for sure.

(I know you know this, Amir, but I'll repeat this for others.)

Any test needs both positive controls and negative controls.
A positive control is something the subject SHOULD hear. A failure there suggest that the test is broken.
A negative control is presenting the exact same thing twice (something you can't do well with LP's and tape, by the way). You should not see any discrimination there. If you do, something is wrong.

As far as picking an audio product, how it looks is important. ABX testing and DBT testing are for scientific use. When you pick out equipment, you pick out what satisfies you, i.e. what you prefer. That's all.

j_j · Oct 6, 2013

Gregadd said:
. However I do not regard an audio review as a scientific experiment.

Agreed, it is a reviewer commenting on their personal preference, and there's no need for "proof" or blind testing unless the reviewer PREFERS to do it that way.

If you would bother to look around a bit, you will find that I have grumped both gently and rather loudly at people who scream "PROVE YOUR PREFERENCE IS RIGHT". There is no right to preference, everyone has one, and as long as it doesn't pass the tip of someone else's nose, so be it.

j_j · Oct 6, 2013

microstrip said:
Amir,

Just imagine the following experiments and results using A and B:

You ask 1000 people to express their preference under DBT and 950 prefer A
You ask another 1000 people and 940 also prefer A

Then you carry an ABX test using the same people and you find that they can not separate them.

What would be your possible conclusions?

You left out a critical factor:

If the A/B preference test is sighted, you're testing different things with the A/B and an ABX test.

So you should expect different results.

j_j · Oct 6, 2013

Don Hills said:
One possible conclusion is that most people, if unable to determine any difference betwen A and B but required to choose, will choose A.
The test would need to control for this, by performing a round of testing where A and B are identical.

You have a point, such a test lacks both positive and negative controls, as well.

j_j · Oct 6, 2013

Gregadd said:
I did listen to the tracls and found they were to short for me to form an opinion. Moreover when I got a fix on the cello the other music wiped it out. I just could not remember what A sounded like long enough to compare a to b.c and d.

Some useful numbers here:

1) primary loudness memory (the most sensitive determination) lasts well under 200 milliseconds. This is why test setups need clickless, rapid, switching. You don't want to mess up the memory of the first signal.
2) memory of auditory features, in the absence of competing information, last a few seconds (i.e. short-term memory).
3) memory of auditory objects, i.e. 'that's a cello' last into long-term (which is far from permanent, but that's another problem) memory.

A test that allows you to switch between what you compare has been repeatedly demonstrated to have the best results.

Now, your part on 'getting a fix on the cello' is also why the subject should be able to loop the sound in order to prevent competing information (other music in this case) from wiping it out.

And, yes, you have to be comfortable, relaxed, and extremely familiar with everything involved in order to get a sensitive result. One does not just jump into the test willy-nilly.

Orb · Oct 6, 2013

Don Hills said:
One possible conclusion is that most people, if unable to determine any difference betwen A and B but required to choose, will choose A.
The test would need to control for this, by performing a round of testing where A and B are identical.

Yeah that bias has been identified in JND research but if they are identical it will trigger the bias; if the person cannot tell a difference or are unsure they will go with an instinctive primary choice, usually this seems to be A as you say rather than the person saying they do not know and passing.
Oversimplifying something that is covered in some very extensive research papers, but an interesting effect and one reason part of the forced choice selection can require them to determine a defined associated perceivable variable or state if they are unsure/cannot tell difference.

I assume you mention implementng identical A/B to see how well the forced choice framework works?
This is one of my concerns with some (emphasis some) of those who do randomise ABX dbt audio selection, they do not remove the bias but actually hide it from the statistics.

Cheers
Orb

microstrip · Oct 6, 2013

microstrip said:
Amir,

Just imagine the following experiments and results using A and B:

You ask 1000 people to express their preference under DBT and 950 prefer A
You ask another 1000 people and 940 also prefer A

Then you carry an ABX test using the same people and you find that they can not separate them.

What would be your possible conclusions?

j_j said:
You left out a critical factor:

If the A/B preference test is sighted, you're testing different things with the A/B and an ABX test.

So you should expect different results.

I think I was clear stating that the preference tests were not sighted ...

microstrip · Oct 6, 2013

Don Hills said:
One possible conclusion is that most people, if unable to determine any difference betwen A and B but required to choose, will choose A.
The test would need to control for this, by performing a round of testing where A and B are identical.

Good point, I now know I must perfect the test. I get 2000 people more to repeat it , but in the second time I reverse the order and the preference for the same object remains with similar ranking. How should I interpret it?

Don Hills · Oct 6, 2013

microstrip said:
... How should I interpret it?

If I understand you correctly, you're saying that for 2 samples of audio, in "blind" listening, there can simultaneously be a statistically significant preference for one over the other, and no significant difference when tested via ABX. I find that unlikely. I would want to see the actual results of the testing before entertaining the possibility that both cases are correct. Personally, I would look at it from another angle: Based on what we know about how we hear, is it possible to generate audio signals which produce the result you have described?

Tips for ABX Tests

WBF Founding Member

WBF Founding Member

New Member

WBF Founding Member

Banned

Banned

WBF Founding Member

WBF Founding Member

VIP/Donor

Well-Known Member

New Member

New Member

New Member

New Member

New Member

New Member

New Member

VIP/Donor

VIP/Donor

Well-Known Member

Similar threads