I do not necessarily disagree Don.
But look at two very good power amps with very low output impedance and they can be audibly transparent and linear in their FR, run a real load simulation and this changes as shown by Stereophile.
I use this as a simple example, the other simple example is Nelson's complex IM distortion where he is using not two tones but I think it was 7 or 8 (and even this does not reflect a simulated real scenario), just curious have you managed to identify by ear IM distortion difference between two preamps where the worst individual harmonic distortion is around 0.02% for both just by using a couple of tones (key being products related to excellent existing measurement)?
Although again this goes against some of Ethan's points historically that measurements are more accurate than the ear (and is argued quite vocally by others in different forums - not suggesting Ethan is one of those).
Music is a problem in any test, bear in mind I did emphasise repeatable-trigger a specific response/behaviour/etc-worthwhile (only poster so far to mention this I think).
But without simulated scenario (all this is really is a realtime and practical version modelling) we may not fully see how two products differ but measure the same; as example the NAD and ARC I provided earlier.
I agree it is easy to find small differences with ok-good products but when we are talking the very good ones with very good design and implementation it looks like they would sound the same.
If necessary I could find more products that from existing measurements should sound the same but with very different topologies.
Thats why instead of actual music I am pondering a repeatable test with major chords at various points on the musical scale (this would not necessarily help to identify a behaviour-trait-problem but would potentially go towards being close to a real simulation).
I doubt you disagree with any of the above.
Cheers
Orb
Hmmm... Disclaimer: I am really not interested too much in this thread, despite appearances -- too debatable based upon limited/inaccurate knowledge, imo. Now to your points:
1. I left off test loads because I was lazy. In fact, you could argue a continuum from a simple static purely-resistive load to doing in-room measurements. In between those two cases will always be room for debate. Even done in-room that only answers it for that one test case (i.e. that one audiophile and maybe a friend or two if you are lucky).
2. I have not performed tests (listening or measurements) or preamps at that level that I recall, or anytime recently. Nor do I have any particular interest in doing so; I feel fairly certain I would be crucified no matter the findings. There are a lot of parameters going into such a test, such as what tones, what level, etc.
3. I do believe that measurements are more accurate than the ears. I can easily (well, with the right equipment and set-up) measure distortion spurs to below -100 dBFS and I really doubt anybody can hear distortion at that level. As I've said, I've been wrong before... I could certainly generate a number of tones and apply them to a couple of components then look at the output on a spectrum analyzer (or analysis system) and find the IMD products (and all the rest). Taking the right measurements in the right environment is the problem, IMO.
4. One of the biggest issues with music as a test source is that it constantly changes, making it very difficult to capture enough samples within a given time frame to get dow to the measurement noise floor. One solution has been to take a segment and replicate it, but of course there are end effects that must be compensated through windowing. The whole process gets complex and at the end you'll have the high-brows at one end trying to refine it further and the golden-ear crew at the other yelling it ain't real enough...
5. We are pretty sensitive to non-harmonic tones, which is why IMD is often considered much more important than THD. You can use as many tones as you want as long as you watch the loading so the peaks don't exceed the system's headroom. It might be interesting to start with two and a given level of IMD, then gradually add tones (carefully selected using prime numbers or e.g. IEEE Std-1241) and see where the IMD is masked. Repeat for a range of levels. Report back in a few weeks. Maybe an NPR (noise power ratio) test would be useful... Using musical chords (must choose the right scale, of course, otherwise you get non-harmonic tones, one reason it is tough to play with a piano) should show the same thing since our 12-note scale is based upon fractional powers of two. That is, a "nice" chord has only tones related to the notes in the chord and their overtones; a "bad" sounding chord will have other tones creeping in. As Ethan noted, if you pick the right tones it is fairly easy to hear the IMD tones -- they really do stand out (to our ears and our instruments).
6. I am running simulations now (well, at several GHz, but...) The list of simulations that could be performed is almost endless, but of course I eventually have to ship the product. I do not think it is all that hard to develop simulations or measurements showing why two products sound different; to my mind it is more often the environment at fault than the limitations of the equipment. One partial solution is to refine the test conditions, e.g. more realistics loads (see 1. above). The IHF and FTC have both worked at that, but of course there has to be a limit to the number of test cases.
So no, I don't think we disagree, but how far we can take this in the real world is always an issue. - Don