In Double Blind testing, is louder perceived as better?

jack_bouska

Well-Known Member
Jan 9, 2011
2
0
388
jgbouska.tripod.com
In Double Blind testing, is louder perceived as better?

I often read about the requirement for precise acoustic level matching when conducting double blind acoustic testing on audio equipment. Apparently, mis-matches in SPL are a variable which may permit human subjects to more readily identify between individual components in A/B/X tests.

I may be mistaken, but I also recall reading (some years ago) that if the levels are mis-matched, then the louder of the two devices-under-test is subjectively perceived to have superior accuracy or quality compared to the quieter device. Unfortunately, my cursory search of the internet has failed to yield any references to published scientific articles which may substantiate this “louder is better” supposition, other than anecdotal claims that audio salesmen often turn up the volume of the more expensive (higher profit) components during customer auditions. (similar to the Pepsi challange)

I don’t wish to initiate a debate on whether these assertions are true or not, but I would like to ask if any forum members could point me towards any published references on these topics. I am interested in finding references where researchers have studied the various “nuisance variables” associated with double blind testing, and where the authors report on one or both of the following subjects:
1) What level of precision is required for matched levels to eliminate the variable?
2) Evidence of statistically significant subject bias or preference (judgement of quality) when levels are not matched.

EG: in “A New Laboratory for EvaluatingMultichannel Audio Components and Systems” SEAN E. OLIVE states “..Spirit 328 digital mixer which provides signal switching and level matching (within 0.03 dB)..” while this paper states a precise dB figure, I would like to know at what point the level mis-matches become statistically significant, and if this in turn biases listener preference.

Also, in “Perceived sound quality of reproductions with different frequency responses and sound levels” (JAS 1990) AIf Gabrielsson states: “Another important physical factor is the sound level. The available evidence indicates that an increase in sound level will usually increase the perceived fullness spaciousness and nearness as well as sharpness and brightness and decreasing sound level gives the opposite results.”
Do any forum members know of other available references to support Aif’s findings?

Thanks
Jack Bouska
(google “jack bouska” for more info)
 
Yes, it's generally accepted that louder = sounds better, mainly due to Fletcher-Munson. Unless of course it's already too loud, in which case even louder = worse.

I think the usual advice is to match levels to within 0.1 dB. That's below the threshold most people can hear a difference, though it depends on the frequency and also the acoustics of the room you listen in. A dead room has less "clutter" going on, letting you hear fine detail better.

Unfortunately, I don't have any scholarly references. But this seems like common sense to me, and most people accept that music sounds clearer and fuller at louder volumes.

--Ethan
 
In Double Blind testing, is louder perceived as better?

I often read about the requirement for precise acoustic level matching when conducting double blind acoustic testing on audio equipment. Apparently, mis-matches in SPL are a variable which may permit human subjects to more readily identify between individual components in A/B/X tests.

I may be mistaken, but I also recall reading (some years ago) that if the levels are mis-matched, then the louder of the two devices-under-test is subjectively perceived to have superior accuracy or quality compared to the quieter device. Unfortunately, my cursory search of the internet has failed to yield any references to published scientific articles which may substantiate this “louder is better” supposition, other than anecdotal claims that audio salesmen often turn up the volume of the more expensive (higher profit) components during customer auditions. (similar to the Pepsi challange)

I don’t wish to initiate a debate on whether these assertions are true or not, but I would like to ask if any forum members could point me towards any published references on these topics. I am interested in finding references where researchers have studied the various “nuisance variables” associated with double blind testing, and where the authors report on one or both of the following subjects:
1) What level of precision is required for matched levels to eliminate the variable?
2) Evidence of statistically significant subject bias or preference (judgement of quality) when levels are not matched.

EG: in “A New Laboratory for EvaluatingMultichannel Audio Components and Systems” SEAN E. OLIVE states “..Spirit 328 digital mixer which provides signal switching and level matching (within 0.03 dB)..” while this paper states a precise dB figure, I would like to know at what point the level mis-matches become statistically significant, and if this in turn biases listener preference.

Also, in “Perceived sound quality of reproductions with different frequency responses and sound levels” (JAS 1990) AIf Gabrielsson states: “Another important physical factor is the sound level. The available evidence indicates that an increase in sound level will usually increase the perceived fullness spaciousness and nearness as well as sharpness and brightness and decreasing sound level gives the opposite results.”
Do any forum members know of other available references to support Aif’s findings?

Thanks
Jack Bouska
(google “jack bouska” for more info)

I think the answer is generally yes, -- all things being equal. However, in the case of loudspeakers all things are seldom equal. Different models of loudspeakers will have audible differences terms of timbre, nonlinear distortion and spatial attributes that provide listeners additional cues besides loudness. The more significant those audible differences are, the less significant role will be the relative loudness differences . This is particularly true if the listeners are trained, and can sort out loudness differences from perceptions related to timbre/spatial/distortion differences.

As long as the frequency response of the devices under test are similar, matching for equal loudness is straightforward. When the devices are not well-matched in frequency response, then the perceived loudness among the devices tested will vary according to the program and its spectra.

It is also important to control the test for absolute loudness, particularly if you listening at levels where the nonlinear/power compression behavior of the devices under test start to become a factor. A good example is testing, small powered speakers (e.g. Ipod docking stations) that have limited excursion, small amplifiers and built-in limiters that sometimes dramatically change the frequency response depending on the playback level. Here, the playback level can determine the results of the listening test.

Re: Gabrielsson's comments on the relationship between level and spaciousness/brightness: there have been some concert hall studies on spaciousness (Baron, Marshall, Ando,etc) and more recent ones by Soloudre and Bradley on listener envelopment (LEV) that show spaciousness/envelopment is related to many factors (frequency content, time delay, angle of arrival, relative level of lateral reflection to direct sound) including absolute level.
 
Last edited:
Of course it is... especially in my job. I always have to match levels within .1dB so I can be sure if I'm making the tracks sound better or worse. The biggest scam in mastering is having the mastered track louder because it's always going to be perceived as better!
 
There are AES papers but I am no longer a member. I'll try to look this weekend to see what I have in my files. One of the problems with A/B (ABX, etc.) is that we are able to fairly readily distinguish much smaller volume changes in those tests. For example, if you play with the knob on a mixer, it takes about 3 dB before most people will notice it got louder (or softer), and 1 dB or so is the threshold in more controlled tests. However, when switching between two sources (actually, tests were run with the same source but at two different levels via a fancy attenuator), most people can detect well under 1 dB, thus the 0.1 dB threshold for reliable testing. And yes, the louder one almost invariably wins.

That said, for A/B testing of different speakers in particular, it's really a crap shoot (can I say that?) because you can only really match the levels at one frequency, or perhaps a small range of frequencies, and the response may vary widely outside that calibration point.

FWIWFM - Don
 
That said, for A/B testing of different speakers in particular, it's really a crap shoot (can I say that?) because you can only really match the levels at one frequency, or perhaps a small range of frequencies, and the response may vary widely outside that calibration point.

FWIWFM - Don

Wouldn't you need to set levels with Pink Noise? That's what I've always done.
 
I have used pink noise, yes, to match average (typ. midband) levels to 0.1 dB or better. The problem is that different speakers will typically emphasize (or reduce) different frequencies, even in an anechoic chamber. So, the average may not yield two speakers that sound the same volume even if the average value on the meter matches. There's no way I know to match broadband to 0.1 dB with two different speakers (outside of filtering or processing to achieve identical freq response). I have also used spot frequencies, typically somewhere between 500 and 2000 Hz (inclusive), and matched to well under 0.1 dB. You have to stay away from the crossover frequencies, and watch for room modes and comb effects, etc.

Personally, I have used fine amplitude matching mainly to assess the electronics, sticking with the same speakers for the trials. Auditioning different speakers, there's so much difference that volume is not a huge concern. It's much easier to hear the differences between two speakers than two SS amps or preamps, for instance. I am much more likely to tell the difference between a B&W 801 and Wilson Alexandria than a pair of Krell and ML reference monoblocks (I have done those tests).

All IMO, of course! - Don
 
Added Tom Nousaine's article which was published in AES in addition to magazine on reliability of one's ear.
 
Do any forum members know of other available references to support Alf’s findings?

Of course, there is the 1990 paper itself, in which he refers to an earlier one (Gabrielsson 1979), where basically the same statements are made.

However, Gabrielsson (1974) states that “the setting of the sound level and the frequency response affect the ratings and/or rankings of the resulting reproductions in different ways at different music sections.”

Staffeldt (1974) found that “the judgements did not change when the level of the reproduction was changed by 25 dB.”

McDermott (1969) found that different levels have different preferences.

Eisler (1966) appears to make statements similar to Gabrielsson (1974), i.e. that the effect of sound level is depending in the program material.

If you want to have a copy the papers, let me know.

Klaus

Eisler, “Measurement of perceived acoustic quality of sound-reproducing systems by means of factor analysis”, JASA 1966, p.484

Gabrielsson, “Judgements and dimension analyses of perceived sound quality of sound-reproducing systems”, JASA 1974, p.854

Gabrielsson, “Perceived sound quality of sound-reproducing systems”, JASA 1979, p.1019

McDermott, “ Multidimensional analyses of circuit quality judgements”, JAES 1969, p.774

Staffeldt, “Correlation between subjective and objective data for quality loudspeakers”, JAES 1974, p.402
 
Thanks for listing the various papers Klaus.

What I am interested in is research involving those trained for loudness/pitch/etc - which Sean briefly touched upon.
A friend of mine years ago taught me this when I started to learn music (he plays for LSO even back then), while a few other of my friends are classically trained musicians.
What is interesting is that loudness does not affect their perception on reproduced music and we done a few blind tests; for them and myself it is just louder that may include musical information that was too quiet before.

But I am rather curious what happens if the loudness is below their level of trained perception, say only 0.2db to 0.3db for music with no quick ABX switching.
In these cases would tone-timber/spacial attributes be deemed better quality on the louder playback system in a linear test (so same system played within its limits) ?

It seems a rather complex field, especially when you consider loudness in terms of musical instruments instead of a playback system, does an A note sound better with different amount of loudness applied by the musician (again within limits), or tuning by ear by traditional experts give different results if the note played is subtly different, or even for voices in say a choir - does it sound better if standing 2m-3m closer?
Also does defining what better is also complicates this as it would affect how you subjectively question/measure the study and is it too generic in some studies, especially when one considers ideal preference.

Cheers
Orb
 
Thanks for listing the various papers Klaus.

One paper I forgot is by Geddes: http://www.aes.org/e-lib/browse.cfm?elib=13722: "sound level significantly affects the perception of linear distortion in audio systems."

What is interesting is that loudness does not affect their perception on reproduced music and we done a few blind tests; for them and myself it is just louder that may include musical information that was too quiet before.

If you look at the equal loudness contours, you see that they are getting flatter with increasing absolute level, what was too quiet before no longer is:


lindos4.png


But I am rather curious what happens if the loudness is below their level of trained perception, say only 0.2db to 0.3db for music with no quick ABX switching.
In these cases would tone-timber/spacial attributes be deemed better quality on the louder playback system in a linear test (so same system played within its limits) ?

Maybe the individual equal loudness contour or the individual perception threshold of distortion is such that louder is not always better. If Gabrielsson (1974) and Eisler are correct, it also may depend on program material.

It seems a rather complex field, especially when you consider loudness in terms of musical instruments instead of a playback system, does an A note sound better with different amount of loudness applied by the musician (again within limits), or tuning by ear by traditional experts give different results if the note played is subtly different, or even for voices in say a choir - does it sound better if standing 2m-3m closer?

Many musical instruments have quite unusual directivity patterns so I think that changing the distance between listener and the instruments will also change the level and perceived spectrum of the early reflections (because of the different angle of impact of the respective reflection on the pinna), so it will not only be louder but possibly also spectrally different. I’ve made a compilation of directivity patterns of all instruments I could find data of, if interested, let me know,


Klaus
 
Ethan Winer said:
...I think the usual advice is to match levels to within 0.1 dB. That's below the threshold most people can hear a difference, though it depends on the frequency and also the acoustics of the room you listen in.

I thought the threshold for hearing differences was a little bit higher than 0.1dB:

From table 18.1 of Springer Handbook of Acoustics, the JND value for amplitude perceptual limit is 0.25 dB.

¿Is it referring to the same parameter? :confused:
 
Last edited:
Springer lists 0.25 dB, others list 0.5 to 1 dB for JND in volume. The 0.1 dB figure came about as a guideline for ensuring testing was below JND's. That is, if the JND threshold is X, you want to make sure you are using X less some margin for error and to ensure correlated sources don't pass above the JND threshold.
 
Ethan Winer said:
...I think the usual advice is to match levels to within 0.1 dB. That's below the threshold most people can hear a difference, though it depends on the frequency and also the acoustics of the room you listen in.

I thought the threshold for hearing differences was a little bit higher than 0.1dB:

From table 18.1 of Springer Handbook of Acoustics, the JND value for amplitude perceptual limit is 0.25 dB.

¿Is it referring to the same parameter? :confused:

Also from my own experience using music instead, I find it is even less sensitive, which is why I wonder if conscionsly we may not notice the difference and when that occurs we perceive instead of loudness (if trained for pitch-loudness) you hear improvements in spatial information,etc.
Not proved or studied.

Oh Klaus thanks for the offer but somewhere I have a good paper somewhere relating to directivity patterns associated with musical instruments-musiucians based and combining theory/practical for improving acoustics for an international orchestra.
Good point on the loudness contour and was something I was sorting of hinting at that those trained in pitch-loudness can recognise music is missing between playing too low and ideal, for them its a parameter while for others they may classify it as an improvement due to not realising the sound was only missing due to being played too quiet to be picked up.
I was lucky many years ago to get training from a talented musician who plays in LSO, while some of my friends are classically trained and also played in orchestras/brass bands; in our cases we seem to notice loudness more as a parameter so if sound is affected due to missing information or just played X db louder it just seems louder.

But here is one thing that really does interest me, while loudness for me (unless something occurs just below our perception as I mention) does not equate to better, I have to say musical chords do equate to being better.
This is the biggest trend I have seen in dance music (using this as an example as it has large following and complex rythms), more so than the loudness wars that are more recent.
By this I mean most dance tracks rely upon using various chords instead of the individual note, sometimes simple major chords in a beat and other times quite complex, however its fair to say that the vast majority of dance tracks rely upon chords.
And I have to say, I am one who is a sucker for chords more so than loudness IMO, and it seems the largest purchasers of music (best selling dance tracks compared to pop music) may feel the same way.

Would be interesting those that did research into loudness is better also look into perception of chords-polyphonic notes and why this may come across better; maybe its to do with the harmonic complexities of the notes when looking in both time and frequency.

Thanks
Orb
 
Orb, the threshold in actual musical source is much higher, up around 1 - 3 dB IIRC; except that in testing with fast switching (< 6 seconds or, again IIRC that's about our auditory memory).

About those chords... If you've a calculator or math program you can pick a starting frequency (A = 440 Hz works), split it into equal tones 1/12th apart to the next octave (880 Hz), then see how the frequencies and harmonics in a chord (e.g. 1,3,5) line up using basic mixing (nonlinear multiplication, e.g. x^2, x^3, etc. terms). What you'll find is that they don't -- some strange frequencies are generated that won't sound good. When you bring the 3rd down and the 5th up, the mixing products line up, and it sounds better. Comes from basic music theory, and there is a mathematical reason for it.

Another point: those chord notes include overtones, and when a group plays (sings, whatever) well together, then all the product tones line up and a rich spectrum of pleasing subharmonics and harmonics is generated. That is one of the reasons why the best groups (orchestras, bands, choirs, etc.) sound so "big" compared to lesser groups.

HTH - Don
 
Thanks Don,
yeah was just going by my own and friends experiences relating to music and volume and not necessarily ABX but digital 0.1db volume increments done by a hidden other, I appreciate ABX sensitivity it will differ and for auditory memory based time when not fast ABXing hehe I really would think its going to be a pretty high db number.

Relating to splitting the chords into equal tones, I get what your saying but when you look at the complexity of a single note from say a trumpet it is never perfect.
And this is made even more complex when considering a musical note from an instrument in the time domain as well as the frequency domain (consider spectral envelop that shows how much the harmonics can change each ms on its amplittude).
Example of a note from a trumpet showing partially what I mean.
http://www.cco.caltech.edu/~boyk/spectra/spectra.htm
The upper trace in Figure 1(a) shows the spectrum of a concert B-flat played on a trumpet with a Harmon ("wah-wah") mute.

A simple tone does not have the same complexity as a real instrument, and its real instruments this becomes interesting with chords.

Or am I missing what you meant, quite possibly I feel lol :)
But I agree, the overtones/rich spectrum do sound big and pleasing.
However its trying to relate the mechanism to; why is loud better, when IMO its actually more related to chords seem better, but part of this critically relates to time domain as well IMO.
This seems the case even if trained for tone-loudness but then there has been no studies looking to both and then defining the subjective term of better.



Thanks
Orb
 
Hey, I play trumpet! I know exactly what you mean... The thing is, in those chords, all the overtones need to line up, and all their mixing products, to get a good-sounding chord. In practice, it's never perfect, but it can be awfully close. We (musicians) listen and adjust our pitches so the chord sounds and "feels" right. When that happens, an FFT will show harmonic relationships among all the overtones (above and below). Aside: "overtones" because the tones are not all harmonics; the overtone series from a trumpet can be nonharmonic for various reasons, among them that the actual source (flapping lips) tends to approximate an impulse function, and the bell flare that provides the distinctive characteristic sound of the horn allows non-harmonic frequencies to be emphasized.

In this case, time or frequency domain (related by the good Dr. Fourier) will show the same thing, properly interpreted.

However, I think we've drifted a bit off-topic, and don't want to get yelled at so will cease my musings. - Don
 
Just as well as I misread your post 16 rofl and took it you were applying a relationship of energy for each partial.
In both cases periodic oscillation exists (whether single not or chords), but for me is the interest how chords are indeed perceptibly better.
My point relating to time domain is that it is the pattern of amplitude changes for the partials that also will have an affect on perception as this is also critical to how an instrument/note sounds.

I do not see us disagreeing, but maybe your thinking of the actual mechanics on why we prefer chords to a single note?
Still, as a musician yourself do you feel that when something is louder it is not necessarily better but just more information if was too low or all information just louder (you can tell it is louder rather than being different and improved)?
And curious do you find chords more pleasing yourself?

It is sort of back on topic :)
Cheers
Orb
 
Quickly,...

To go to extremes, do you prefer a single tone, or a piece of music?

Dynamics in music add contrast and thus interest. All the same is boring, yes? Actually, one of the tricks is to play softly with the same intensity and richness of tone as when playing loudly...

Of course, I love a good chord!
 

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu