Materials and methodology 101. It is a problem that you can't eliminate can you?
Well, it is a hell of a lot more accommodated here than you can when doing your own evaluations. Speakers are shuffled to the exact same position every time. When you do personal evaluation you are doing it in different time and space. So if there is an issue here, it completely renders ad-hoc evaluations useless!
By long term familiarity with a system. Ever notice how you can tell when something's wrong with your system?
No I don't. The longer the period, the less I remember small differences. Even larger differences can be forgotten or imagined. Do you have a listening test where as time goes by accuracy improves? I have tested myself many times and I am confident of what I know relative to my detection ability. The best detection for me is when the switch time is well under a second.
The other problem with long term testing is adaptation. The brain learns to forget artifacts. Take a projector that has a fan that makes noise. When you first walk in, you instantly hear it. Start watching the movie and after a while you "forget" about it and don't even know it is doing that.
Again how can you tell that the ML isn't telling you what you're source really sounds like. Based on what you've posted, that original signal has undergone so much processing that one wonders whether it really sounds like the original recording.
There are two domains: what led to the capture of the content. And what leads to its reproduction. We can't connect one end to the other. But we can try to best reproduce the middle common ground. Would you buy an amplifier that had the response of this Martin Logan speaker (M) below?
For you to be right, we need to both ignore the listening test results and measurements. That is a pretty big hurdle. It can be done if you had data. A hypothesis against the facts is not
.
And if we designed by the numbers and eliminate the human response, we'd be stuck with more halls like those designed by Cyril Harris. Should have sounded perfect but sounded like dreck from day one.
I don't konw Cryil Harris
. But do know that it is true that sometimes perfection sucks
. An anechoic chamber brings out the best in a speaker yet it is not a space that is pleasant for enjoyment of music. Our brain is so used to reflections in a room that it uses that as clues for intelligibility and sense of normalness (sp?). So yes, we do want what is pleasant and preferable to us and this test shows that result. It nullifies many of the variables and as such, yields much more accurate evaluations than our own ad-hoc testing. Not perfect by any means but the conclusions that lead to certain measurements which in turn lead to better designs.
So it's better to use a recording that the participants know nothing about and make a judgement about the sound of the recording? How do they know what's right?
I can't explain it to you any more than Dr. Toole did. You sit there wondering about this yet have little trouble scoring down the poorly sounding speaker. You think you are alone in that and it must be your taste. Then you see a bunch of other people voting like you and you realize that the power our brain has in interpreting what must be right. A boomy bass is just that: boomy. It is not right and you can tell that. Am amplifier clipping is wrong too and we can tell that with no reference.
Just wondering if you've ever done a live vs. recorded event? If not, you might find it interesting when you can sit close to a simple instrument like a guitar, just how badly digital recreates the sound and harmonic envelope of the instrument.
Unrelated to this topic. All speakers are receiving the same signal. If a speaker sounds worse on digital than another, it is the fault of the speaker. If there are speakers that are sold for analog sources only, I do not know about them.
Bottom line is that these are far from perfect experiments and should be taken with a grain of salt knowing what we know about hearing, psychology, physiology, statistics, etc. (For instance, to demonstrate any scientifically valid statistical effect with 50 people knowing what we know about interaural differences, is basically ridiculous eg. the results are rigged before the study even begins.)
I am not one to just jump on a bandwagon of this type. I have studied this work up and down. I have spoken at length with the researchers and sat through the testing myself. This is work that is presented at major conferences and journals such as AES and ASA. They represent major advances in what we should do in speaker design. If it were so ridiculous, they would have been countless papers saying so. Instead, there is nothing but respect for this work.
There is always this cry that what we hear does not correlate with what we measure. Well, here is work that with high confidence connects these two. It is remarkable that something this complex can be at the end of the day, very well understood. And this is not theory or one data point. Countless speakers have been designed using this methodology proving its efficacy. Same system has been used to evaluate and prove fidelity of car audio for billion dollar contracts.
As I have noted, lack of perfection does not invalidate test results. You have to look to see what can still be learned and a lot can. Indeed, I am confident any competent speaker designer believes in some or all of this. They may simply not be in a position from marketing, test and R&D resources, and desire for differentiation to follow it.