Creating a Audio Dartboard

I think taking a test is a stress. Because every person is an individual, they will respond differently to a stress. (just look at some high priced players who come to the Yankees like Ed Whitson) and can't stand the pressure of performing before a big audience!).

Now, I think we can agree that a sigmoidal curve would describe the improvements vs. price of high-end gear. And if that improvement is in an area that really is important to the listener, like dynamics, then that improvement seems maybe larger than it is.

But it's like an athlete here too: when an athlete first starts training, they have an unlimited window of opportunity. As they train and improve in the various tasks they have to learn to compete, that window closes down and one has to become more creative in getting that last couple of pct of performance from the athlete. But it's that last couple of pct, that gain of 1 or 2 inches/stride, that over a 100 meter distance, makes the difference between winnng and losing the race. Same goes for high end audio. It's about getting the max from the system--that last couple of pct that brings one closer to the music and real event! And unfortunately, it seems that the parts etc. needed to get to this place cost a bit more money, etc.
 
Same goes for high end audio. It's about getting the max from the system--that last couple of pct that brings one closer to the music and real event! And unfortunately, it seems that the parts etc. needed to get to this place cost a bit more money, etc.

I was at Barnes and Noble yesterday looking in the latest Hi-Fi News (Or the other UK mag, can't recall). There was a US-made speaker in there that had exotic parts and an awesome finish. But when looking at the measurements it was clear that the design was a train wreck. That's where some of the high-priced stuff falls apart -- the design. This speaker was not hi-fi -- i.e., it could not be true to the source to save its life.

When great design and great parts and great execution come together you have something special.
 
I was at Barnes and Noble yesterday looking in the latest Hi-Fi News (Or the other UK mag, can't recall). There was a US-made speaker in there that had exotic parts and an awesome finish. But when looking at the measurements it was clear that the design was a train wreck. That's where some of the high-priced stuff falls apart -- the design. This speaker was not hi-fi -- i.e., it could not be true to the source to save its life.

When great design and great parts and great execution come together you have something special.

Great point Jeff. Can't make a silk purse out of sows ear! And in fact, sometimes its surprising what designers can produce with ordinary parts. One product that always amazed me was the original Amber 70 amp. Beat the pants off of everone's fave amp at the time-Haflers esp. Look inside and it was very ordinary parts.
 
Great point Jeff. Can't make a silk purse out of sows ear! And in fact, sometimes its surprising what designers can produce with ordinary parts. One product that always amazed me was the original Amber 70 amp. Beat the pants off of everone's fave amp at the time-Haflers esp. Look inside and it was very ordinary parts.

I remember the Amber amps from the late 70s into the early to mid 80s.

Rich
 
I remember the Amber amps from the late 70s into the early to mid 80s.

Rich

Yes basically one and done. The preamp and integrated amp never went anywhere. But for solid-state at the time, it was very good, esp on MGIIIs.
 
I think taking a test is a stress. Because every person is an individual, they will respond differently to a stress. (just look at some high priced players who come to the Yankees like Ed Whitson) and can't stand the pressure of performing before a big audience!).

Now, I think we can agree that a sigmoidal curve would describe the improvements vs. price of high-end gear. And if that improvement is in an area that really is important to the listener, like dynamics, then that improvement seems maybe larger than it is.

But it's like an athlete here too: when an athlete first starts training, they have an unlimited window of opportunity. As they train and improve in the various tasks they have to learn to compete, that window closes down and one has to become more creative in getting that last couple of pct of performance from the athlete. But it's that last couple of pct, that gain of 1 or 2 inches/stride, that over a 100 meter distance, makes the difference between winnng and losing the race. Same goes for high end audio. It's about getting the max from the system--that last couple of pct that brings one closer to the music and real event! And unfortunately, it seems that the parts etc. needed to get to this place cost a bit more money, etc.

Myles

I have no problem accepting a few of your point .. Taking a test is a stress. Let's agre on that. Why under the same stress certain components ( Speakers, Transducers in general to repeat myself and to a certain extent electronics) are repeatably discerned and the differences and descriptions of perceptions consistent? and for others some of them ... You can finish my sentence/interrogation?
If a given subject or group of subject experience stress in test and because of that his/their abilities are reduced wouldn't that be for ALL the components/tests? Does that suggest then that for some components the threshold is do low as to be unreliable ? That he threshold , thus the reliability and the accuracy of the perception fades in the noise of uncertainty? I think you know the answer

Now for the good parts, bad parts ... I don't know what to make of this sincerely ... High End products for the good majority do have carefully selected parts but so are many mass markets products .. Some Pioneer Elite, Some Onkyo, Integra, Cambridge Audio, etc ...
On speakers I am on the fence .. One would be surprised to go out and check the drivers used in some very expensive speakers ... very ordinary at times .. Oh! They claim , modification and specially manufactured models but ... On the crossover side? Maybe.. but the drivers, arguably the most critical link of the chain, are often ordinary or if you want me to be polite quasi-ordinary ... So I think it is the talent of the great designers to manage these constraints and often contradictions to arrive at good, often great products ..
Also I have learned not to disparage mass-market products. The Integra receivers are the real deal...The Oppo DVD players in general and their blu-ray/universal players, in particular are mass-markets and truly serious and honest reproducers of sound and High-End reproducer of pictures ... No hype needed .. whether the tests would be performed BT, DBT or otherwise ...

Frantz
 
On stress, one way to reduce it is to perform your own tests. As a participant in a test with others, the need to be right is almost assured to push people to vote incorrectly at least some of the time when differences are small. In the "privacy of your own home," that stress goes way for the most part.

It might be useful to make a list of Pros and Cons of scaled blind testing. Here is my quick shot:

Pros:
  • Removes participant and conductor bias.
  • Much more accepted scientifically as valid data.
  • Can represent views of collective audiences.
  • Provides excellent insight into true ability of participant. A person who fails the "control" but others do not, can be easily dismissed as not having proper hearing ability.

Cons
  • Expensive and time consuming to set up. We used to pay $100 to $200/person to conduct such tests plus the cost of the lab. So even a quick test cost us $30K+.
  • Easy to create a test incorrectly and arrive at wrong conclusion and call it valid because the core methodology is valid. This can happen both in the design of the test and how it is run.
  • Unless analyzed by segments, averaging tends to mask the results of more accurate participants. If 5 people can alway shear a difference, but 50 do not, then the average roughly indicates 50-50 outcome or "pure chance." As audiophiles, we may care about the 5, not the other 50.
  • It is boring. Boring to run, and boring to participate in. Watching paint dry is better sometimes :). Imagine hearing the same 30 second audio clip over and over again for 30 minutes.
  • Per above, stress or competitive zeal can cause people to vote incorrectly.

There are probably other items that can go in the list but this is what I can think of quickly.
 
I think taking a test is a stress. Because every person is an individual, they will respond differently to a stress. (just look at some high priced players who come to the Yankees like Ed Whitson) and can't stand the pressure of performing before a big audience!).

Now, I think we can agree that a sigmoidal curve would describe the improvements vs. price of high-end gear. And if that improvement is in an area that really is important to the listener, like dynamics, then that improvement seems maybe larger than it is.

But it's like an athlete here too: when an athlete first starts training, they have an unlimited window of opportunity. As they train and improve in the various tasks they have to learn to compete, that window closes down and one has to become more creative in getting that last couple of pct of performance from the athlete. But it's that last couple of pct, that gain of 1 or 2 inches/stride, that over a 100 meter distance, makes the difference between winnng and losing the race. Same goes for high end audio. It's about getting the max from the system--that last couple of pct that brings one closer to the music and real event! And unfortunately, it seems that the parts etc. needed to get to this place cost a bit more money, etc.

I've been participating in controlled listening tests for 25 years both as an administrator and subject. Listeners have never report stress as an issue when doing their task except when:

1) They are not given proper instruction in their task, or lack proper training/practice in the task

2) They have something personal at stake (reputation/ego as golden ear, they know their own product is in the test, which they shouldn't since this causes expectation bias)

3) The test is too long (> 30-40 minutes) or too difficult (too many scales/input required from the listener or the audible effects are very small). You can clearly see these effects in the decreased performance of the listener.

All of these problems can be resolved if you properly train and select the listeners, and design tests that minimize stress factors.
 
Last edited:
On stress, one way to reduce it is to perform your own tests. As a participant in a test with others, the need to be right is almost assured to push people to vote incorrectly at least some of the time when differences are small. In the "privacy of your own home," that stress goes way for the most part.

It might be useful to make a list of Pros and Cons of scaled blind testing. Here is my quick shot:

Pros:
  • Removes participant and conductor bias.
  • Much more accepted scientifically as valid data.
  • Can represent views of collective audiences.
  • Provides excellent insight into true ability of participant. A person who fails the "control" but others do not, can be easily dismissed as not having proper hearing ability.

Cons
  • Expensive and time consuming to set up. We used to pay $100 to $200/person to conduct such tests plus the cost of the lab. So even a quick test cost us $30K+.
  • Easy to create a test incorrectly and arrive at wrong conclusion and call it valid because the core methodology is valid. This can happen both in the design of the test and how it is run.
  • Unless analyzed by segments, averaging tends to mask the results of more accurate participants. If 5 people can alway shear a difference, but 50 do not, then the average roughly indicates 50-50 outcome or "pure chance." As audiophiles, we may care about the 5, not the other 50.
  • It is boring. Boring to run, and boring to participate in. Watching paint dry is better sometimes :). Imagine hearing the same 30 second audio clip over and over again for 30 minutes.
  • Per above, stress or competitive zeal can cause people to vote incorrectly.

There are probably other items that can go in the list but this is what I can think of quickly.

Wow! At $100-200 per listener I could have made a good living as a professional Microsoft Listener. Our trained panels are internal employees so they get their salary plus $15 Amazon credits per test as an incentive.

When we do recruiting of consumer panels that can get costly. We got recent estimates for testing 200-300 high income Japanese subjects in Japan that ranged from $100-300k. Getting a recruit of rich people to participate in a listening test is not easy since they are not motivated by the money, are busy and have better things to do.

I agree with all of your points. I see a lot more listening tests being submitted for publication in journals these days where the test results may be valid for the conditions that were tested, but the conditions do not reflect anything that exists in reality, or the premise behind the hypothesis they are testing is flawed. You end up having to reject the paper for publication.

It's always sad that someone can spend so much time, effort and money on listening tests when the basic research question and the premise behind the question is invalid.
 
Last edited:
Wow! At $100-200 per listener I could have made a good living as a professional Microsoft Listener. Our trained panels are internal employees so they get their salary plus $15 Amazon credits per test as an incentive.

When we do recruiting of consumer panels that can get costly. We got recent estimates for testing 200-300 high income Japanese subjects in Japan that ranged from $100-300k.
:)

The numbers I used were for outside agencies to recruit test subjects just the same. We also had our own salaried employees who did testing but they were not sufficient for large scale testing. Nor were their views accepted when we were trying to publish the results as they could be accused of knowing our sound and preferring it as a result.


We did use general Microsoft population for testing where it didn't cost us anything but the issue there was that we could not force them to do the test right away.
 
before we get to intangibles, let's ezxplore some testing basics-esp. The neglect for the organism being tested.

does each speaker go to the same spot or do you predetermine the correct room placement first for each speaker first? And do you use ml electronics for all your tests? If so, then i might point out that one can be developing a speaker that will only work with one amplifier/electronics chain. (this is not a trivial matter since i once ran into the case where a amplifier designer used a single speaker to design/voice his electronics. Turned out that the amplifier only ended up sounding good on that one speaker (and at least four, quite dissimilar speakers were tried with the amplifier.) and i'm not quite clear how you prevent the listener from identifying the speaker they're listening to?

We normally test each speaker in the same location so that it excites the room resonances/reflections patterns equally. If there are audible differences between the speakers under test they are due to the speaker itself, and not the room, since the room/speaker position variable is held constant. The chosen speaker/seat location produces a reasonably smooth low frequency response at the primary listening seat.

We use amplifiers that are accurate/neutral (flat frequency response, low distortion. noise) and have adequate power to drive any speaker load we are testing.If the loudspeaker sounds bad it's not because of the amplifier (which is accurate) but because of the loudspeaker. Are you suggesting we use poorly designed amplifiers to compensate for poorly design loudspeakers? That would be like giving the listener an equalizer and saying, adjust this poor sounding speaker until you prefer it.

two, how do you account for the learning process?.

We carefully select listeners based on their audiometric performance and their demonstrated ability to perform training tasks and listening in a consistent and discriminating manner. Some people just don't have the right stuff when it comes to listening -- We monitor their performance in training tasks using various performance metrics, and in actual tests. From this, we can tell when they have reached their peak learning based on amount of time it takes to complete task, and their performance in the task.

how do you control for adaptation of the organism to the stress?

There is no evidence that our trained listeners are stressed based on their listening performance. I've already commented on factors that can induce stress in a listening test in another post, and we try to minimize these factors.

then while i think your trying to "teach" novices about listening, in the end what you're testing is their ability to learn what you taught them, not necessarily what constitutes music.

We use music as test signals. So how does that not constitute music?

The object of a listening test is to get subjects to focus on and describe sound quality differences -- not the music. When you focus on the music, you become useless as a critical listener. Ask people who listen to crappy ipod earbuds, portable radios how they can stand listening to such bad sound and they will respond: " I don't care about the sound, I just care about the music" As long as people can hear the melody, harmony, rhythm and understand the lyrics, they can enjoy music on pretty well anything.

I've tested many musicians and worked with them when I was a recording engineer, and they are not always the best people to ask about sound quality. If you look at the audio scientific literature musicians are among the worst subjects for evaluating audio equipment -- until they are properly trained. My best listener was an engineering intern who never had a music lesson in his life.

now i understand what you're trying to accomplish and that is laudable, but one can't just accept everything at face value and not think about the test methodology.
I've been thinking about the test methodology for the past 25 years, and have actually contributed to the scientific literature about the topic. Read my papers, Floyd Toole's book "Sound Reproduction", or Zacharov/Bech's book on Perceptual Audio Evaluation. My PhD research had an element of listening test methodology research in it: I presented the same stimuli (music reproduced through different loudspeakers in different rooms) to two separate subject groups using a successive versus intermixed trial design, and seeing how this affected listeners' adaption to the room acoustics.

I'm not saying there isn't more research to be done in listening test methodology, but it is a pretty mature scientific field, and many of the most important questions about methodology have been already addressed.
 
Last edited:
:)

The numbers I used were for outside agencies to recruit test subjects just the same. We also had our own salaried employees who did testing but they were not sufficient for large scale testing. Nor were their views accepted when we were trying to publish the results as they could be accused of knowing our sound and preferring it as a result.


We did use general Microsoft population for testing where it didn't cost us anything but the issue there was that we could not force them to do the test right away.

Was this a MUSHRA-type sound quality test (multiple comparison with reference and anchors) or an ABC with hidden reference? Could Microsoft employees identify the Microsoft codec even under blind conditions?
 
Was this a MUSHRA-type sound quality test (multiple comparison with reference and anchors) or an ABC with hidden reference?
They were usually of the latter type. Note that in some cases the testing methodology was dictated by the lab as to not have us look like we were influencing them.

Could Microsoft employees identify the Microsoft codec even under blind conditions?
At lower bit rates, yes especially on familiar material. At higher bitrates, I am not certain. I know I used to think I could tell but when tested blind once, I could not :). However, in that case, there were anomalies in how the test was run so I am not too sure that was conclusive.

How about your testers? I would think they can more easily pick the units they have used for training.
 
BTW, the way I could tell was by knowing the exact science behind our compression system and hence, be able to spot when it would get in trouble. If I did not know that, I don't think I would be able to identify the right technology.
 
They were usually of the latter type. Note that in some cases the testing methodology was dictated by the lab as to not have us look like we were influencing them.


At lower bit rates, yes especially on familiar material. At higher bitrates, I am not certain. I know I used to think I could tell but when tested blind once, I could not :). However, in that case, there were anomalies in how the test was run so I am not too sure that was conclusive.



How about your testers? I would think they can more easily pick the units they have used for training.

I can see that if listeners have knowledge of what is being tested, know how it sounds (once the product can be recognized by its sound, the test is no longer blind), and know how it measures and works, that would constitute a serious bias. If I have seen measurements of the loudspeakers before a test, and/or I am told there is a particular loudspeaker behind the curtain, it completely throw me off: I start "listening for the product" rather than focusing on the task at hand. More often than not, I am wrong, and my error variance and inter-listener reliability is so large that it just adds noise or outliers to the data.

So I would agree that the less the person knows about what is behind the curtain, the better they will perform, and the less biased the test will be. We avoid telling our trained listeners what is being tested, and don't involve people directly involved in the design of the product unless they are unconvinced of the results, and need to hear it themselves. We can leave their data our of the final tabulations.

I routinely retest our products with external listeners to confirm that our tests are not exhibiting these types of biases.
 
How do you train your listeners Sean? Is there speaker-specific training to for example, hear non-linearities in drivers, etc?
 
Sean,

Putting the speaker behind a curtain doesn't affect the sound? Many audiophiles remove their speaker grille cloths when doing serious listening in order to improve the sound of the speaker.
 
I was sitting here thinking about some the ridiculousness of the DBT/measurements group and their proponents total lack of comprehension about music. Musical emotion can't be broken down into 0s and 1s or lots of waves or Fourier transforms floating around an oscilloscope or statistical tests. Let's face it. The human ear is the best test instrument ever placed on this Earth. And for those who say it's unreliable, is the ear any more unreliable than our eyes? Certainly our eyes can be fooled. What riles me up even further are those who trust their vision but don't trust their ears!

Hell, when was the last time you ever heard of a musician buying a violin or piano based on its measurements? When have you ever heard a musician say all instruments sound alike? Try never. I've shopped with several musicians looking at the best pianos and they use the exact same terms (maybe just a little different jargon) to describe the sound of the pianos they're auditioning. Look the piano measures flat. Look, the parts used in building the instrument don't matter. ROFLMAO (and to boot, those who have measured masterpieces like Strads, still are at a loss to explain what gives the instrument its beautiful tone.) Utter nonsense.

Sorry to be the bearer of bad news but ...The human ear might be a good test instrument when we are born, but it starts to deteriorate once we hit the age of 18, and gets progressively worse with age due to presbycusis . At age 42, the average hearing loss increases by 1 dB per year according to Dr. Brian Moore at Cambridge University, an expert in hearing and hearing loss. If you have exposure to loud noise, you can suffer even more serious permanent hearing damage.

Does hearing loss affect our ability to make reliable judgments of sound quality? It most certainly does. The research of Dr. Floyd Toole indicates a clear relationship between mean hearing loss below 1 kHz and higher standard deviations in listeners' loudspeaker ratings (see figures 1 and 2 in this paper). For this reason, we screen our listening panel based on their hearing, and reject any listeners with more than 15-20 dB HL at any audiometric frequency. We don't have many rock and roll musicians or listeners over the age of 40 on our panel.

You ask why don't people trust their hearing more than their vision. The reason is that our vision is the more dominant sense. When there are bi-modality conflicts in sensory cues between our hearing and vision, our vision usually wins. The ventriloquism effect and the McGurk effect are two examples.

Your example of musicians purchasing instruments based on listening (not measurements) may well be true but it's a flawed process when applied to choosing equipment for sound reproduction, IF the goal is to accurately and faithfully reproduce the art: the music recording. Scientific-based subjective and objective measurements have proven to be more accurate, reliable and effective in getting us closer to this goal, than casual, uncontrolled, biased istening using uncalibrated, possibly defective devices (our aged ears).

Finally:
1. Just because no two musical instruments sound the same (do you have proof?) doesn't mean that two audio components cannot. In fact, there is evidence in the literature that two well-designed amplifiers sound the same to many listeners under controlled listening conditions, at least until the amplifiers are driven into distortion.

2. Another update: Science has figured out why Strads sound the way they do according to this article.

3. Not all DBT/measurements group and their proponents have a "total lack of comprehension about music". I studied music since about the age of 7, and have studied classical piano, theory, orchestration and conducting. Being a musician and a scientist are not mutually exclusive.
 
Last edited:
How do you train your listeners Sean? Is there speaker-specific training to for example, hear non-linearities in drivers, etc?
We have listener software that uses DSP to simulate frequency response aberrations, different spatial and dynamic attributes. For nonlinear distortions in loudspeakers we have a Matlab tool that simulates and auralizes the nonlinear behavior in loudspeaker's magnetic/mechanical/inductance parameters - based on arbitrary or actual measured values using Klippel's measurement system.
 
Sorry to be the bearer of bad news but ...The human ear might be a good test instrument when we are born, but it starts to deteriorate once we hit the age of 18, and gets progressively worse with age due to presbycusis . At age 42, the average hearing loss increases by 1 dB per year according to Dr. Brian Moore at Cambridge University, an expert in hearing and hearing loss. If you have exposure to loud noise, you can suffer even more serious permanent hearing damage.

Does hearing loss affect our ability to make reliable judgments of sound quality? It most certainly does. The research of Dr. Floyd Toole indicates a clear relationship between mean hearing loss below 1 kHz and higher standard deviations in listeners' loudspeaker ratings (see figures 1 and 2 in this paper). For this reason, we screen our listening panel based on their hearing, and reject any listeners with more than 15-20 dB HL at any audiometric frequency. We don't have many rock and roll musicians or listeners over the age of 40 on our panel.

You ask why don't people trust their hearing more than their vision. The reason is that our vision is the more dominant sense. When there are bi-modality conflicts in sensory cues between our hearing and vision, our vision usually wins. The ventriloquism effect and the McGurk effect are two examples.

Your example of musicians purchasing instruments based on listening (not measurements) may well be true but it's a flawed process when applied to choosing equipment for sound reproduction, IF the goal is to accurately and faithfully reproduce the art: the music recording. Scientific-based subjective and objective measurements have proven to be more accurate, reliable and effective in getting us closer to this goal, than casual, uncontrolled, biased istening using uncalibrated, possibly defective devices (our aged ears).

Finally:
1. Just because no two musical instruments sound the same (do you have proof?) doesn't mean that two audio components cannot. In fact, there is evidence in the literature that two well-designed amplifiers sound the same to many listeners under controlled listening conditions, at least until the amplifiers are driven into distortion.

2. Another update: Science has figured out why Strads sound the way they do according to this article.

3. Not all DBT/measurements group and their proponents have a "total lack of comprehension about music". I studied music since about the age of 7, and have studied classical piano, theory, orchestration and conducting. Being a musician and a scientist are not mutually exclusive.

OK so I propose that no one over the age of 20 should be involved in designing speakers or listening to high end audio gear. That solves that problem.

Pardon me. Vision is the dominant sense is news? Think you've discovered something new? Try standing on one leg, then closing your eyes and standing on one leg. Bet you'll fall over! Ruskies showed us this half century or more ago.
 
Last edited:

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu