When I began working for Hi-Fi Choice (1992), we were typically blind testing between 16-20 products per month. These would be from a single category (CD players, for example) and not limited to price band. The test group would be played to a panel of listeners (typically between three and five listeners) and each DUT would be given a test-consistent presentation of recordings designed to both be representative of as many musical genre as possible and good for detecting differences.
Strict level matching (to within 0.1dB) was mandatory for any form of audio electronics. It's harder to achieve anything like the same degree of level matching with loudspeakers, however, and best-case matching got to within 1-2dB. Most devices would be given a single presentation, although usually two or three within each group would be resubmitted during the test to determine if the listeners were achieving some form of consistency.
Audible memory isn't a big problem, in fact it's a bonus if the tester is able to listen past their personal preferences. But the test protocol can have a broad tendency to prefer strong flavours as a result and the product that does the least harm can be the one that comes roughly eighth in a group of 16. The prevailing problem with any such test is determining limits on the number of tests possible in any given session by any given listener; this varies from person to person, and from session to session, but we usually found that between three and four devices per morning or afternoon session were universally tolerated.
Scientifically speaking, it's fairly easy to pick holes in this. The test should be conducted double-blind, as single-blind still allows the administrator to lead (consciously or otherwise) the listeners. The listeners should test one at a time instead of as a group, which can lead to strong characters steering the test. The number of presentations per product and the number of listeners in the test panel are chronically undernourished to form anything objectively constructive, etc, etc.
Hi Alan,
Thanks for your reply – I really appreciate it. I understand the impetus behind needing to evaluate a number of products quickly to meet deadlines and print date, and that the methodology has its limits, correlating with my understanding of memory favoring events (products) which produce the greatest highs or lows (“flavours”, as you put it). That the product that “does the least harm” would end up in the middle of the bell curve is no surprise.
I still find it difficult to understand how any one can claim to remain objective as one’s exposure to and experience of audio components increases. My father visited me a number of years ago when I had Living Voice OBX-RW’s and I played him Hotel California (his choice, he loves the Eagles). He said “Gosh, it sounds so clear.” Not detailed, transparent, textured, dynamic, palpable, scary real, close to a live mic-feed or any other buzzword. He said “clear”. Why? Because he’s not an audiophile. He lacks the lexicon with which to describe prerecorded music via a hi-fi system. Most people on this forum though, probably understand what those words represent, and actively employ them when discussing their systems. The lexicon apropos a subculture is highly suggestive of what that particular subculture values. It suggests that when we develop a sophisticated lexicon, we have created (consciously or subconsciously) a value system enabling us to make decisions and judgments apropos the mechanisms by which that subculture expresses itself. Once that is learned, and especially, once that has become culturally entrenched, it is almost impossible for it to be unlearned. Therefore, any seasoned audiophile going into a double-blind test already has biases acquired through language. “Warm” is a bias, a judgment, a perception. “Detailed” is a bias. “Linear” is a bias, and we carry all of them with us all the time.
The problem, from where I sit, is that continued experience is likely to reinforce biases (to make us recall easy descriptors), rather than remove them. I find it difficult to fathom how a seasoned audiophile can claim to be objective in the evaluation of an audio product when their evaluation of that product is always going to be informed by their previous experience and their use of language in articulating their experience. No one is approaching this for the first time. We’ve heard too much and formed too many opinions we’ve expressed through phrases like “palpable midrange bloom” and “razor sharp bass”.
We can, of course, retreat to solely relying on measurements, but I know of only one or two people who have assembled a system based on Floyd Toole’s white papers for Harman International.
My current understanding is that memory will force us to go for the path of least resistance first and foremost, and for the audiophile, that means recalling previous highs and lows and articulating their experience through a common culturally-prescribed set of values, predisposing the individual to confirmation bias, backfire effect, overconfidence, etc. That doesn't mean critical listening is impossible, but I could not confidently put myself in a camp where I believed I was above falling into the above traps. I mean, I like vinyl, valves and horns - what would I know?
There's very little on audible memory in double-blind tests for a reason, because the psychoacoustics suggest there is no such thing, beyond a few fleeting seconds of being able to hold a sound in our heads. There is no mechanism for long-term storing of sound quality - organised musical structures, speech patterns and noises, yes. But the more subtle temporal, tonal or spatial patterns are not stored. This invites the question of how you can possibly say you prefer one particular interpretation of a piece of music; your audible memory is so short you should be unable to tell one complete Beethoven movement from another, because by the time you listen to the second version of the movement, you will have forgotten the subtle cues of the first. As such it might be that saying "I prefer the Kleiber version to the von Karajan" is as objectively meaningless as saying "I prefer the Arcam amp over the NAD".
I understand the science of psychoacoustics only on a very superficial level based on what I’ve read. However, I suggest that it can only be at best an incomplete science due to two behaviours I’ve witnessed consistently during the thirty years I’ve been playing, engineering, mixing and (occasionally) mastering music (though only as an amateur).
The first relates to our “inability” to store tonal values. Aside from my own experience, the prevalence of individuals possessing “perfect pitch” – that is, the ability to name individual pitches either on their own or in clustered chords, sing notes accurately without a prior reference, identifying when an instrument is out of tune without prior reference, and, bizarrely, name the pitch of a non-musical instrument like, say, an alarm – has been well documented and scientifically validated many times over. Absolute pitch is a function of higher-level cognition, thought to occur through the suppression of lower-level brain function which is why perfect pitch is found in higher number of individuals with optic nerve hypoplasia or autism. While it is difficult for an individual not born with a predisposition toward perfect pitch to learn without considerable effort and high level of maintenance, it’s possible, albeit rare. Anecdotally speaking, I’ve worked with a few who possessed it and it never ceased to amaze me. (I’m okay with notes, chords and keys, but only in the context of playing music and organized musical structures – a guitar tuned to Drop D is an easy one to identify as I’ve played it a lot).
The second relates to our “inability” to store temporal patterns. I’ve done this a few times: The drummer will be in the studio and will be fed a click track through headphones. They’ll play a few bars with the click and then we’ll turn the click off in the phones, having them continue to play, but recording both the drummer and click into ProTools (or Logic, or whatever…) with no music – just recording the drums on their own. It’s almost impossible to do (and I can’t) but I can say without any exaggeration that a few drummers I’ve worked with will maintain the tempo exactly without wavering for several minutes at a time. Not only that, when we match the waveforms they’re so on it’s ridiculous.
Both of these suggest (to me) that our understanding of psychoacoustics and auditory perception are limited but developing. Given that neuroscience is still in its infancy relatively speaking, I prefer to take a heuristic approach to hi-fi least I paint myself into a corner and need to recant my beliefs. (I once owned a full-blown Naim system so I’d prefer not to have to join a cult again in the near future…)