As the original poster I think we are getting lost in the results. It's the process we should consider.
I think someone else raised the quesion-How doies the ordinary listener match levels?
The problem I feel though Gregadd is that the article does not cover anything that I have just been discussing with Arny or other factors but seems to focus on the usual and is not very helpful either way; for those who support or have reserved feelings towards ABX.
Even discussing matching levels could be lengthy as I would argue that its key use is removing cues or tells and critically ensure the factor is controlled to assist with usable end test data.
Others may put forward that there is a preference for louder but it can also be said that there is an ideal loudness for any track, which actually may be lower.
While we have the thresholds for loudness over FR, this does not necessarily help in concluding how two different volumes affect the listeners choice subjectively when they concentrate on soundstage,timbre,etc.
There is anecdotal evidence for both sides of this, but what is universal is that level matching is essential when considered as a controlled variable (to assist with test data at the end) and to remove cues/tells, of course the last point about cues/tells can be removed if the volume can be reset in a way that the listener cannot establish its position or volume knob turn to each AB.
What I am interested is in whether there is actual perception bias affecting soundstage/timbre/etc when the listener is trained in both loudness and notes-scales (pitch);
in those instances consider those who play in a symphony and how they must play note and nuance perfect for different loudness of passages and as solo or part of the section and whole symphony, or those who are master tuners of instruments by ear.
Both break the proposal that volume by default affects and changes a listener's sound quality-pitch-etc, and there are plenty of other examples.
Problem is this needs further studying so to me it seems we are currently having to make assumptions based on what studies have been done, but this means we need to be careful how we suggest loudness affects a listener and cannot generalise beyond my points above on why we need or when to level match.
That said I do wonder (again not tested) whether a bias is in effect for soundstage/timbre once the volume increase is below a listener's accurate perception of volume differences, so someone may accurately tell there is a 0.3db to 0.5db volume difference and as far as they are concerned sound quality is identical, but what happens at say 0.2db; nothing or does perception become affected (this would be marginal difference) for even trained listeners.
Another aspect to consider that shows how level matching fluctuates a lot in reality;
Output impedance of the amp against the loading by the speaker (average amp will fluctuate between 0.3db and 0.7db with tubes being greater), speakers do not have uniform sensitivity although manufacturers suggest they do and critically behave much worse with music and this can vary by up to 6db peak to dip, one could then expand this to look at Class D (beyond Hypex and Primare where both have exceptionally low output impedance across frequency), frequency suckout in speakers,etc.
However the consideration here and is important; these are affected by part of the frequency range so a fluctuation within the music, instead of being flat across the whole frequency and so more linear.
Cheers
Orb