Is ABX finally Obsolete

Status
Not open for further replies.
I would just like to reiterate, that which I in fact I acknowledged in a previous post, that I'm fully in agreement with ABX being excellent for testing variation on slightly different audio tracks, with no change in the physical configuration of the playback chain. That I have no trouble with at all ...

Frank
 
May be we could do an exercise - what would be the alternative ways of looking for small differences if ABX was patented and the owners of the patent did not allow any use of it?
While I have disagreed with Micro on some other issues, I want to emphasize this question he posed. I think it is not only an excellent question, but it also is directly on point to the OT. Indeed I have asked this question of myself and others many times.

I will slightly rephrase the question by adding two words: "what would be the alternative ways of looking for small differences without bias if ABX was patented and the owners of the patent did not allow any use of it?"
 
While I have disagreed with Micro on some other issues, I want to emphasize this question he posed. I think it is not only an excellent question, but it also is directly on point to the OT. Indeed I have asked this question of myself and others many times.

I will slightly rephrase the question by adding two words: "what would be the alternative ways of looking for small differences without bias if ABX was patented and the owners of the patent did not allow any use of it?"

To be fair, there is subtle bias in ALL of the just near difference sensory testing procedures.
When looking at it from a scientific view, the tests need to be tied in with a model that enables the weighting of the bias while also providing a hit rate and false alarm rate.
The problem is though that it has been proven participants cognitive approach can be split between two ways;
These are as I mentioned earlier are difference decision strategy and independent observations strategy.
However, this is compounded that it has been measured and shown with various theorems/models that the participant's sensitivity and accuracy differs between the two, and critically if looking at the wrong decision strategy utilised then the required ROC is also wrong and this affects data conclusion.
The predictive modelling-behaviour of the participant is absolutely critical, because as the sensitivity or detection becomes harder so the % accuracy lowers, and it is having the associated signal detection theory model that is needed to weight the conclusion or validate the test methodology-process.

The ideal cognitive decision process by a participant is the independent observations strategy, however this is counter-intuitive to most and unfortunately default to difference decision strategy for ABX.
Which, IMO is compounded by having two references and in most instance of such forum discussions focusing ABX on the worst case scenario of JND (just noticeable difference).
Furthermore this is compounded that this also relies not just on the detection but also identification, and both are seperate entitities in term of cognitive decision process.

Coming onto the alternatives:
Well 1st is the same-different methodology, however I was a bit cheeky earlier I must admit as I really should had pointed out that early research in the 60s (albeit focused on Yes-No) identified certain response bias or training of the participant (it could be the scientific test wants to study those who do not become trained in specifically detecting the trait) to reflect close to a natural selection, also I think one of Tom Nousaine's view on switching away from AB was the loss of sensitivity he felt AB caused.
However, if done correctly and that is the key point, it is still recognised as being good for detection of very near similar sensory values where magnitude,etc are not important (ABX may help in situations where there is a wider variance type in the stimuli or defining observer factors like magnitude-etc along with other test methodologies that are not necessarily JND focused).
http://www.astm.org/Standards/E2139.htm
However what the link does not show is the correspondence discussing same-different.

Beyond same-different (2IAX) there is the more popular 4IAX.
This is 4 interval same-different where two stimuli (AA or BB) are presented in intervals to the participant, the stimuli in the 1st pair or 2nd pair of intervals are the same, and the stimuli in the other pair of intervals are different.
This works very well and has been used a lot and still is, this overcomes from what I understand a lot of the concerns relating to response bias and has a usable ROC (from what I understand now though 2IAX also does).

Then there is the 2AFC, which is the 2 alternative force choice and the participant is forced to also identify the characteristic change and not just the detection, the issue is with JND this becomes incredibly hard and relies on accurate ROC for more difficult detection testing.
This IMO is not an ideal methodology when talking about possibly the smallest JND stimuli, say near identical power amps using different topology (comparing an average Class A to average Class AB), which so far suggests that if there are differences they are so small as not to be detectable by ABX.

Looking back, I think we all forget that each test has a place but we have mostly been applying a single absolute in the discussion, either focusing on miniscule just noticeable differences or as broad spectrum from small to reasonable sensitivity.
It is important that we remember that there is a sliding scale relating to just noticeable differences and the difficulty on both detection and identification.
What test is used should critically consider whether both detection and identification is required, the purpose of the test that could deal not with near identical stimuli but larger factors between two groups and a wider varying X, also the level of sensitivity or detection, and critically how to apply the methodology (various subtle biases can be introduced into any of these including ABX) and weight-validate the results and participant behaviour.
Coming to behaviour and fitting with the above paragraph one needs to also consider the participants cognitive decision process (difference decision strategy or independent observations strategy) that can affect results and critically affects which model is used to assist in validation-bias-weighting-results.
Scientifically these are essential.

Ok that is as far as I am going with the science side, hopefully this fits in with Amir's request to refocus back on it in a balanced and science type perspective, which he was right to suggest I feel.
There may be a few mistakes in here so bear with it as I will check it later on today, or when the next ABX related thread is opened in say a month by someone :)

Cheers
Orb
 
Last edited:
While I have disagreed with Micro on some other issues, I want to emphasize this question he posed. I think it is not only an excellent question, but it also is directly on point to the OT. Indeed I have asked this question of myself and others many times.

I will slightly rephrase the question by adding two words: "what would be the alternative ways of looking for small differences without bias if ABX was patented and the owners of the patent did not allow any use of it?"

We would drop the X.

Tim
 
Forget ABBA too. That's taken by some Swedes. ;)
 
It's not I who claim ABX is to hard. It's the people who do advocate it. Without exception they have all claimed it's too difficult to do with any consistency.

I would like to respectfully submit that your study of *all* of the people who are doing DBTs for the purpose of audio evaluation appears to a tiny bit less than comprehensive.

You're jumping to far-reaching conclusions based on what? A few PR papers released by just one company?

I happen to know of nobody who routinely does ABX tests who thinks that it is hard to do them with consistency, and I know dozens of people who routinely do ABX tests.
 
We would drop the X.

Seems like that would drop you back to a same/difference test which is really either an AX, XB, BX, or an XB test. As long as you provide an opportunty to get the wrong answer, it appears that you have to have an X. If you don't have an opportunity to get the wrong answer then you have failed the a basic rule of experimental design, which is that the test has to provide some means for the hypothesis to be falsified.
 
To be fair, there is subtle bias in ALL of the just near difference sensory testing procedures.

IME you are stopping way too early. There is subtle or non-subtle bias in every test, even tests involving test equipment. So the world of audiophilia responds to that by not trying to effectively address bias at all. In fact bias is an effective sales tool.
 
PS. What is the purpose of testing a hypothesis if it is not to prove it or disprove it? Intellectual masturbation?

In high end audio the purpose of testing a hypthesis it to sell the product at hand. We all know that, right?

The whole problem with ABX and other DBT methods. amd in fact all of science is the same - they don't reliably produce the conclusions that lead to the greatest profits.
 
I also fundamentally disagree that ABX was created to determine if there is or isn't a difference. I believe it was created knowing that there ARE differences. The purpose is to determine if the differences are statistically significant as far as a sample population is concerned.

This is true on a number of levels.

On a philosophical level, only positive hypotheses are readily provable. "Negative hypotheses are difficult or impossible to prove".

On a practical level, ABX was, in the big picture, devised to resolve the question of whether or not there are differences. But, the person who did the bulk of the initial work and produced the first working system and used it the first time (me) was of the "pro differences" opinion.
 
The ideal cognitive decision process by a participant is the independent observations strategy, however this is counter-intuitive to most and unfortunately default to difference decision strategy for ABX.
Which, IMO is compounded by having two references and in most instance of such forum discussions focusing ABX on the worst case scenario of JND (just noticeable difference).
Furthermore this is compounded that this also relies not just on the detection but also identification, and both are seperate entitities in term of cognitive decision process.

Coming onto the alternatives:
Well 1st is the same-different methodology, however I was a bit cheeky earlier I must admit as I really should had pointed out that early research in the 60s (albeit focused on Yes-No) identified certain response bias or training of the participant (it could be the scientific test wants to study those who do not become trained in specifically detecting the trait) to reflect close to a natural selection, also I think one of Tom Nousaine's view on switching away from AB was the loss of sensitivity he felt AB caused.
However, if done correctly and that is the key point, it is still recognised as being good for detection of very near similar sensory values where magnitude,etc are not important (ABX may help in situations where there is a wider variance type in the stimuli or defining observer factors like magnitude-etc along with other test methodologies that are not necessarily JND focused).
http://www.astm.org/Standards/E2139.htm
However what the link does not show is the correspondence discussing same-different.

That particular document disqualifies itself as an audio testing methdology:

It says:

ASTM said:
1.8 This test method may be chosen over the triangle or duo-trio tests where sensory fatigue or carry-over are a concern, or where a simpler task is needed

Same/difference testing involves one comparison per trial. Each trial is thus an all-or-nothing sitaution. It makes a lot of sense when the stimulus involves some element of surprise. For example in an articultion test, repeating the comparison will introduce learning, and that learning would make the test asymmetrical with the actual real world performance that is being tested.

If we only listened to a sample of reproduced music just once in our lives, then this kind of same difference test would make more sense. But, this it not how most of us listen to music. We often listen to the same selection and even musical passage over and over again, sometimes repeatedly.

This contrasts with a live musical performance that most attendees only hear once. It is well known among practitioners that the fact that a live musical event is heard by most attendees only once makes the audience more accepting of less-than-perfect results.

I can tell you from practial experience that even if you screw up pretty bad early in a live event, but keep things straight for the last 15 minutes or less, the error early in the event will largely be forgotten.
 
Arny,
It seems your saying now an international organisation is wrong, the scientist who use 4IAX are wrong and careful 2IAX are wrong along with the more difficult 2AFC that is challenging IMO but used more than ABX in other disciplines, but you are right.
And these tests are still involving tones-melodies,etc, with other stimuli such as comparable two products in taste.
However what is the point in discussing testing methodology as you manage to ignore the scientific background and intention of signal detection theory and the primary use of same-different and its comparables, and then ABX relating to points that seem to be unknown or non-ideal such as participants defaulting to cognitive behaviour of difference decision strategy instead of independent-observation decision strategy, which when talking about incredibly small JND may be critical.
Triangle and duo-tri are still used today in the ways they mention.
The key aspect to this though is the ROC as I mentioned earlier and understanding the decision strategy of the participant so sensitivity-accuracy can be calculated.

Anyway if you feel ASTM are wrong, by all means apply to be on the technical committee for when discussing standards.
To me, it comes across as ABX is the only testing approach at the exclusion of what is being done scientifically and in modern signal detection theory and cognitive behaviour.
But, again Arny.
Where is the data to validate ABX with other well known JND methodologies?
It would be great if we had this and IMO would boost ABX in debates with those who may be skeptical.
If you still do ABX why not compare results using the same particpants using one of these methodologies, however I appreciate you may feel this is unwarranted due to your conviction ABX is working and results calculated within acceptable parameters.

Edit:
Just to add there are other reasons to use ABX, such as if determining an observer's decision when X is not identical to A or B, could be for the observer to decide if a tone that is in-between is closer to A or B, or a color that is closer to A or B - in both cases there are no identical matches - very simplistic example so please no bashing :)

Cheers
Orb
 
Last edited:
Wouldn't be surprised if some in this forum rather drop the unsighted part

And, as a result, the without bias, or any attempt in that direction. Many Audiophiles so love their bias that they have not only embraced it as reality, they see it as the only possible reality. Plato's cave, indeed.

Tim
 
Arny,
have you ever heard of ASTM?

You're actually seriously asking me that question, no joke?

I've known about the ASTM since I was a little kid in the 50's, reading the books in my daddy's bookcase.



So now an international organisation is wrong, the scientist who use 4IAX are wrong and careful 2IAX are wrong along with the more difficult 2AFC that is challenging IMO but used more than ABX in other disciplines, but you are right.

Still you aren't joking? Saying that a standard does not apply to a particular situation (which is what I did) is of course vastly different than saying that the standard is wrong. Another way to look at it is that I used an ASTM document as my authority for saying that an ASTM document does not apply to a situation that said document is mute about.



However what is the point in responding to you about testing methodology as you manage to ignore the scientific background and intention of signal detection theory and the primary use of same-different and its comparables,

Again, far-reaching conclusions based on exactly what evidence? I wrote a post about the applicability of a certain document, not a PhD thesis about subjective testing. If the fact that I didn't write a PhD thesis for your pleasure, disturbs you perhaps you need to get the moderator update the terms of service to make that a prerequisite!

and then ABX relating to points that seem to be unknown or non-ideal such as participants defaulting to cognitive behaviour of difference decision strategy instead of independent-observation decision strategy, which when talking about incredibly small JND may be critical.

Where did I say all that?

Triangle and duo-tri are still used today in the ways they mention.

I've seen the opinion granted that ABX is just a slight variation on a triangle test.


Anyway if you feel ASTM are wrong, by all means apply to be on the technical committee for when discussing standards.

Straw man argument as I have already explanied.

To me, it comes across as ABX is the only testing that has any meaning in your world, at the exclusion of what is being done scientifically and in modern signal detection theory and cognitive behaviour.

To me, your ignorance of what I've said in a variety of contexts and forums is noted. So is your ongoing willingness to make far-reaching accusations based on what is at best described as highly limited knowlege.
 
With myself and Arny discussing different methodologies, and both of us coming back to response type-related bias, this has me thinking.
Possibly the ideal way initially testing for just noticeable differences in the context of this thread is not by using actual music, but condensing this down to specific well recorded polyphonic chords and tones of various instruments; this gives us a complex waveform with attack and decay that is easier for the observer to analyse within a reference point perspective.
In theory IF amps do sound different with music, then by using chords and tones generated by real instruments solo or group-symphony, the differences will still be there.
The difficulty is deciding what to use, but initial study should have a whole range and this is narrowed down, or shortcuts could be taken and what is thought to be the ideal ones used to test two sensory similar products.

Maybe someone could think about this further and create quality test tracks based on just held polyphonic chords and notes.
Cheers
Orb

To my mind this is possibly one aspect that is wrong with current testing if specifically only investigating whether as an example "all amps sound the same".
 
Status
Not open for further replies.

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu