Ethan,
that was my point about cabinet resonance, driver issues,etc.
You mention Frequency response, Distortion, Noise, Time-based errors.
However I have now pointed out a fundamental requirement relating to time domain, this means your Frequency response is just data collected that has no meaning unless integral to a specific test, in this instance such as a more complex test that provides waterfall plot measurement-data, on top of this speaker sound is also compounded by impedance/phase of the speaker and the output impedence (a critical factor)-power supply stage and output stage-design of the power amp.
Again this affects Frequency response.
The problem IMO is that you can say frequency response covers the majority of measurements, but it has absolutely no meaning outside of its test scope, because a lot of what I mention above are frequency related but tested in different ways and have different implications.
It may seem I am being picky but this is critical as I see often many assume that its the simple single frequency response that tells us everything for speakers-preamps-DACs (they do sound different when you consider the various reconstruction filters) and possibly quantisation/aliasing that can throw ultrasonics into the respecting pre-power amp.
Relating to driver breakup, again IMO the problem is that a waterfall plot will show implications of cabinet and driver than distortion measurement.
Why?
Look at a speaker's measurements at 90db at 1 metre for distortion, and then look at the waterfall plot, the only one that tells you there is a potential problem is the waterfall plot.
And again distortion does not tell us what is occurring in the time domain, where we can see that it acutally fluctuates in the time domain and not all are equal in behaviour.
The big question though Ethan, is how do you prove your hypothesis if you do not find several reviews that reach the same conclusion or perception of a speakers sound and then compare it to the detailed speaker's measurement?
Even HK have to correlate measurements to a listener's perception in their testing.
You could be right, but it really is important to try and match that hypothesis (because it is not proven just yet) to measurements and how it matches a listeners perception.
Relating to the Nad and ARC Ref 5, I am tempted to be very cheeky

Heck yeah why not hahaha, if we go with your point about 60k load and even say sensitivity-impedance matching-etc, then this suggests all source-preamps-poweramps will sound different even when measuring well and this has nothing to do with frequency range/distortion/errors and yet you argue all preamps and power amps that have flat FR-noise are sonically transparent

More seriously, the measurements done by Paul stopped at 4th harmonics because the dominant ones (2nd) were before that and beyond 4th they were insignificant (even 3rd and 4th were very low)
Also I agree to a certain point about the 60k load, but then frequency response related measurements are not exactly accurate as it takes a single sinewave usually at 1khz.
Also I am yet to see any test procedure that uses sine sweeps anywhere comparable to say a musical chord from one instrument that is vastly more complex.
That said, the 60k load as I mentioned before is a long stretch for why a zero feedback tube design and a cheap implemented feedback design different using Jeff's descriptions.
Especially when the distortion and FR measurements are negligle in terms of difference, unless you think changing the load will dramatically alter this over 20-20khz?
If so would be interested to know how, and thats the important part of the debate.
If 60k load is an issue, how will it affect the negligible measurements relating to distortion-FR-Noise that you mention, bearing in mind as we see in the two preamps they have significantly stable measurements?
This does not square with your statement:
The measurements ARE comparable in terms of audibility, unless your now using the subjectivist argument and saying we are not measuring everything between the Nad and ARC to show differences

And then this would be skewing the argument possibly towards a bias/favour because ALL measurments done historically by Soundstage/SP/Hifi news/etc take a specific load/sinewave/etc and are better than anything I have seen users or manufacturers do.
This is important because your arguing the measurements I provided may not have meaning while also saying well designed amps-preamps are sonically transparent, when I doubt no-one will state a cheap solid state feedback designed preamp sounds the same as a tube preamp with zero feedback as shown in my example that IMO are comparable enough with their measurements to be used for this debate.
I could expand the debate to flat 20-20khz DACs but they can subtly sound different due to different filter implementations, or the engineering debate relating to negative feedback where two opposing well regarded engineers can show how negative feedback does have a behaviour on the amp and one goes states position for none (Nelson), while the supporter argues that it should be heavily implemented (however Bruno also goes on to say well implemented negative feedback is not cheap).
I could go on, for me its an interesting debate and hope you continue discussing this Ethan along with maybe Jeff/Myles/Amir/Don/etc piping in as it would make sense for them to correct me where I made some mistakes or add their views to the specifics we are talking about.
But, at some point we do need to correlate the good listeners' perceptions-reviews to those of their real measurements and show how they fit in with either hypothesis (I agree even the data-measurements I am showing do not go that far to answer those outlined by Jeff).
Thanks
Orb