While I have disagreed with Micro on some other issues, I want to emphasize this question he posed. I think it is not only an excellent question, but it also is directly on point to the OT. Indeed I have asked this question of myself and others many times.
I will slightly rephrase the question by adding two words: "what would be the alternative ways of looking for small differences without bias if ABX was patented and the owners of the patent did not allow any use of it?"
To be fair, there is subtle bias in ALL of the just near difference sensory testing procedures.
When looking at it from a scientific view, the tests need to be tied in with a model that enables the weighting of the bias while also providing a hit rate and false alarm rate.
The problem is though that it has been proven participants cognitive approach can be split between two ways;
These are as I mentioned earlier are difference decision strategy and independent observations strategy.
However, this is compounded that it has been measured and shown with various theorems/models that the participant's sensitivity and accuracy differs between the two, and critically if looking at the wrong decision strategy utilised then the required ROC is also wrong and this affects data conclusion.
The predictive modelling-behaviour of the participant is absolutely critical, because as the sensitivity or detection becomes harder so the % accuracy lowers, and it is having the associated signal detection theory model that is needed to weight the conclusion or validate the test methodology-process.
The ideal cognitive decision process by a participant is the independent observations strategy, however this is counter-intuitive to most and unfortunately default to difference decision strategy for ABX.
Which, IMO is compounded by having two references and in most instance of such forum discussions focusing ABX on the worst case scenario of JND (just noticeable difference).
Furthermore this is compounded that this also relies not just on the detection but also identification, and both are seperate entitities in term of cognitive decision process.
Coming onto the alternatives:
Well 1st is the same-different methodology, however I was a bit cheeky earlier I must admit as I really should had pointed out that early research in the 60s (albeit focused on Yes-No) identified certain response bias or training of the participant (it could be the scientific test wants to study those who do not become trained in specifically detecting the trait) to reflect close to a natural selection, also I think one of Tom Nousaine's view on switching away from AB was the loss of sensitivity he felt AB caused.
However, if done correctly and that is the key point, it is still recognised as being good for detection of very near similar sensory values where magnitude,etc are not important (ABX may help in situations where there is a wider variance type in the stimuli or defining observer factors like magnitude-etc along with other test methodologies that are not necessarily JND focused).
http://www.astm.org/Standards/E2139.htm
However what the link does not show is the correspondence discussing same-different.
Beyond same-different (2IAX) there is the more popular 4IAX.
This is 4 interval same-different where two stimuli (AA or BB) are presented in intervals to the participant, the stimuli in the 1st pair or 2nd pair of intervals are the same, and the stimuli in the other pair of intervals are different.
This works very well and has been used a lot and still is, this overcomes from what I understand a lot of the concerns relating to response bias and has a usable ROC (from what I understand now though 2IAX also does).
Then there is the 2AFC, which is the 2 alternative force choice and the participant is forced to also identify the characteristic change and not just the detection, the issue is with JND this becomes incredibly hard and relies on accurate ROC for more difficult detection testing.
This IMO is not an ideal methodology when talking about possibly the smallest JND stimuli, say near identical power amps using different topology (comparing an average Class A to average Class AB), which so far suggests that if there are differences they are so small as not to be detectable by ABX.
Looking back, I think we all forget that each test has a place but we have mostly been applying a single absolute in the discussion, either focusing on miniscule just noticeable differences or as broad spectrum from small to reasonable sensitivity.
It is important that we remember that there is a sliding scale relating to just noticeable differences and the difficulty on both detection and identification.
What test is used should critically consider whether both detection and identification is required, the purpose of the test that could deal not with near identical stimuli but larger factors between two groups and a wider varying X, also the level of sensitivity or detection, and critically how to apply the methodology (various subtle biases can be introduced into any of these including ABX) and weight-validate the results and participant behaviour.
Coming to behaviour and fitting with the above paragraph one needs to also consider the participants cognitive decision process (difference decision strategy or independent observations strategy) that can affect results and critically affects which model is used to assist in validation-bias-weighting-results.
Scientifically these are essential.
Ok that is as far as I am going with the science side, hopefully this fits in with Amir's request to refocus back on it in a balanced and science type perspective, which he was right to suggest I feel.
There may be a few mistakes in here so bear with it as I will check it later on today, or when the next ABX related thread is opened in say a month by someone
Cheers
Orb