Why Live-Versus-Recorded Listening Tests Do Not Work

tonmeister2008 · Jul 10, 2010

Figure 1: Singer Frieda Hempel conducting a Tone Test at Edison Studios, NYC in 1918. Note that many of the listeners' ears are covered by the blind folds making it a double blind and double deaf listening test, since the experimenter Edison was deaf himself.

Recently I was asked how I could possibly prove or assert that listeners prefer accurate loudspeakers without having performed a live-versus-recorded listening test. This is a test where the listener compares a live musical performance to a recording of the performance reproduced through loudspeakers. The closer the sound quality of the reproduction is to that of the live performance, the more accurate the loudspeaker is deemed to be - at least in theory. In practice, these tests are usually ridden with so many uncontrolled listening test nuisance variables that the results are essentially meaningless. This article examines why live-versus-recorded listening tests are not suitable for serious scientific investigations of the perceived sound quality of recorded and reproduced sound.

Edison’s Tone Tests: “People will hear what you tell them to hear”
Thomas Edison was one of the first audio engineers to embrace live-versus-recorded demonstrations. In 1910, he invented the Edison Diamond Disk Phonograph, which he claimed had “no tone” of its own. To prove it, a series of road shows were given across the United States where about 4,000 live-versus-recorded demonstrations of his photograph were conducted in auditoriums. At some point during the live music performance there would be a switch over to the recorded performance, and apparently audience members could not tell the difference between the live and recorded performances.

After a 1916 live-versus-recorded demonstration in Carnegie Hall, the New York Evening Mail stated “the ear could not tell when it was listening to the phonograph alone, and when to actual voice and reproduction together. Only the eye could discover the truth by noting when the singer’s mouth was open or closed” [1]

By today’s standards, the fidelity of Edison’s disc phonograph was egregious in terms of its noise, distortion, limited dynamic range, bandwidth and frequency response (you can hear some of Edison’s recordings online here). It’s hard to imagine that listeners were fooled into thinking his Diamond Disk recording could not be distinguished from the live performance. In fact, we now know that Edison manipulated the tests to produce the results he wanted. First, he carefully chose the music and musicians to work within the technical limitations of his technology. Edison detested music with extreme dynamics, high tones, vibrato and complex textures because they were a challenge to his deafness and his Tone Tests. He selected and coached musicians to mimic the sound of their recordings to minimize the audible differences between live and recorded performances [1],[2].

Secondly, Edison was the consummate audio salesman and was known to say, “People will hear what you tell them to hear” [2]. The expectations and perceptions of his listeners were manipulated before the test to produce a more predicable outcome. Audience members were given a concert program before his Tone Tests that clearly told them exactly what they would hear, how amazing it will sound, and what an appropriate response would be:

“Those who hear this test will realize fully for the first time how literally true it is that Mr. Edison has made possible the re-creation of the artist’s voice. No more exacting test could be made to demonstrate that the New Edison actually does re-create the voice of the artist than to play it side by side with the artist who made the records. This is the final proof. Close your eyes. See if you can distinguish the voice of the New Edison from that of the artist. Did you ever believe it possible to re-create a voice? Note that the voice of the artist and the voice of the Edison are indistinguishable” [emphasis is mine] [ 3].

Edison%20Live%20vs%20Recorded%20Test.png

Figure 2: Another Edison Tone Test where biases related to sight and smell may have compromised the results based on the many listeners covering their noses. Did a bad case of singer's halitosis make it possible to identify the live performance based on smell alone?

Other Live-versus-Recorded Demonstrations
Following Edison’s live-versus-recorded demonstrations, other tests have been conducted by Harry Olson at RCA, and G.A. Briggs (Wharfedale) and Peter Walker at Quad in the 1950’s. [4]. A common problem with these demonstrations was double reverberation: the reverberation of the room was heard both in the recording, and again when it was reproduced through loudspeakers in the same room. This made it easier for listeners to tell the difference between the recorded and live performances.

Acoustic Research's Live-Versus-Recorded Demonstrations
During the 1960’s, Acoustic Research (AR), an American loudspeaker company, performed over 75 live-versus-recorded concerts in cities around the USA featuring The Fine Arts String Quartet, and the AR-3 loudspeaker [5],[6]. To solve the double reverberation problem, the recordings of the quartet were made in an anechoic chamber, or outdoors. Outdoor live-versus-recorded demonstrations had the added benefit that there were no room reflections in either the recording or the live performance. This made the demonstrations less sensitive to off-axis problems in the microphones and loudspeakers. It also eliminated the challenge of capturing and reproducing the complex spatial properties of a reverberant performing space.

The AR demonstrations apparently generated an enormous amount of free publicity in newspapers and audio magazines where it was reported that the reproduction of the recordings was virtually indistinguishable from the live performance. AR sales increased dramatically, to the point where in 1966 AR apparently owned 32% market share of loudspeakers sold in the United States.

A Live-Versus-Recorded Method For Testing Loudspeaker Accuracy
Edgar Villchur, the head of Acoustic Research, to his credit, was a firm believer that loudspeakers should accurately reproduce the art (the recorded music) and not editorialize or enhance it. In a 1962 paper, he described a live-versus-recorded method for evaluating the accuracy of loudspeakers [7]. The method used a reference loudspeaker (the live performance) that was placed in the listening room with the loudspeaker-under-test. The goal of the loudspeaker-under-test was to accurately reproduce a recording of the reference loudspeaker playing white noise in an anechoic chamber. The original white noise was also fed to the reference loudspeaker during the listening test. The more similar the loudspeaker-under-test sounded to the reference speaker the more accurate it was, at least in theory.

Villchur acknowledged that the sensitivity and validity of the method depended on the quality of the reference loudspeaker, its directivity, and the choice of program material, White noise was more revealing of loudspeaker inaccuracies than music. His reference loudspeaker consisted of a single 2-inch midrange from an AR-3 loudspeaker because he found using multiple drivers caused acoustical inference that was audible in the anechoic chamber, but not so audible in a reverberant listening room; these differences would produce errors in the listening test. One wonders how a tiny 2-inch driver could have produced adequate high treble and low bass without distortion. As such, these limitations would significantly limit accuracy and usefulness of this listening test method.

Another problem with this method was that the anechoic loudspeaker recordings were made at a single point in space, and did not capture the directivity and off-axis characteristics of the reference loudspeaker. Unless the speaker-under-test had exactly the same directivity and off-axis characteristics of the reference loudspeaker, it could never sound exactly the same in a reflective listening room. To compensate for these errors, Villchur used a trial-an-error process to find the best microphone position relative to the reference loudspeaker where the timbre of the anechoic recording best matched the timbre of the reference loudspeaker when placed in a room. Adjusting the recording to mimmic the sound of live performance is the reverse of what Edison’s musicians would do, but essentially it’s produced the same bias. (Edison would have been proud!)

Finally, it is not clear how Villchur controlled loudspeaker positional biases when comparing the reference loudspeaker to the loudspeaker-under-test. Loudspeaker positional biases have been shown to produce audible effects that can be larger than the audible differences between different models of loudspeakers [9] At Harman, these positional biases are eliminated via an automated speaker shuffler that places each loudspeaker in the same position of the room.

Summary of Problems with Live-versus-Recorded Tests
By today’s standards, the live-versus-recorded tests performed to date lack the necessary scientific controls and rigor to consider their results or conclusions accurate, repeatable and valid. Below are a few of the most significant psychological, physical, methodological or experimental listening variables that plague these types of tests. While it is possible to control some of these variables, others are either impossible, impractical or too expensive to control.

Sighted and Cross-Modality Biases
To date, most of the live-versus-sighted tests have been performed sighted, where non-auditory cues were available to allow the listener to identify whether they were hearing the live or reproduced sound source. These tests could have been easily made blind via an acoustically transparent curtain; however, scientific validity was apparently not the primary purpose of the test. The visual cues from the musicians (bowing, lip syncing) would also enhance the realism and presence of the reproduction, a well-known cognitive effect observed in research of binaural and virtual reality displays.

Listener Expectation, Authority Bias, Group Interaction Bias
In many of the public live-versus-recorded demonstrations, listeners expectations were manipulated by knowledge given to them by the organizers of the demonstrations. In some cases, listeners were told what the expected response should be before the test began (see Edison's concert programs above). In large groups settings, listeners' responses can be easily swayed by the opinions and reaction of other members in the group (a herd mentality), especially when an authority member is present. These biases are easily removed from live-versus-recorded tests by repeating the test for each individual listener. The live and recorded performances would have to be replicated for every listener, which makes the tests too difficult, expensive, time consuming, and impractical to use.

Qualifications of Listeners
None of the live-versus-recorded tests I've read about have reported the hearing and critical listening qualifications of the listeners who participated in them. These are important variables in the sensitivity and reliability of the test results, and can be easily quantified.

Live and Recorded Performances Must Be Identical
For live-versus-recorded tests to be valid, the live and recorded performance should be identical, having the same notes, intonation, tempo, dynamics, loudness, balance between instruments, and the same location and sense of space of the instruments. Otherwise, there are extraneous cues other than sound quality ones that allow listeners to readily identify the live and recorded version. Midi-controlled instruments (e.g. player pianos) are but one example of how this problem could be resolved.

Positional Biases from Live and Reproduced Sound Sources
Unless the live and reproduced (e.g. loudspeakers) sound sources occupy the same physical locations, the listener can always identify the live versus recorded versions based on the localized positions of the sound sources.

Errors in the Recording
The usefulness of live-versus-recorded methods for perceptual measurements of sound quality in the playback chain is severely limited by errors in the recording. The recording errors are not easily separated from the errors in the playback chain (see circle-of-confusion). Microphones and microphone techniques both contain errors that limit the timbral, spatial and dynamic accuracy of the recordings through which we judge loudspeakers. Apparently the most effective live-versus-recorded demonstrations were conducted outdoors - effectively an anechoic environment - where the off-axis performances of the microphones and loudspeakers, and the complex spatial cues of a reflective room were largely removed as factors from the experiment. However, results from outdoor live-versus-recorded tests cannot be generalized to how the loudspeakers would perform in real rooms, where the off-axis sounds provide a significant contribution towards the listener's impression of the loudspeaker.

Lack of Proper Scientific Protocols, Listener Response Data, Statistical Analysis, Results
The most interesting characteristic of live-versus-recorded tests is that they never seem to provide listener response data, statistical analysis or published results. Eyewitness reports written in newspapers or magazines do not constitute scientific evidence.

Accuracy is Not Applicable to Most Recordings Made Today
Most recordings made today are not intended to sound like the live performance. Anyone who heard Taylor Swift's live performance with Stevie Nicks at the 2010 Grammy Awards understands why. About 90% of commercial recordings are studio creations consisting of a series of overdubs, processed with auto-tuning, equalization, dynamic compression, and reverb sampled from an alien nation. For these recordings, there is no equivalent live performance to which the recording/reproduction can be compared for accuracy. The only reference is what the artist heard in the recording control room. If the important performance aspects of the playback system through which the art (the music and recording) was created can be reproduced in the home, then the consumer will hear an accurate reproduction of the music, as the artist intended. It is possible to achieve this if we adopt a science in the service of art philosophy towards audio recording and reproduction.

Conclusions
In reviewing the history of live-versus-reproduced tests, most have been performed as elaborate sales and marketing demonstrations designed to fool listeners into believing that a product sounded much better and more accurate than it actually was. While live-versus-recorded tests have proven their merit as an effective marketing and sales tool, they have not yet proven themselves as a serious method for scientific experiments intended to advance our psychoacoustic understanding of music recording and reproduction.
The reason for this, I believe, is that live-versus-recorded tests do not adequately control important listening test nuisance variables, a prerequisite for accurate, reliable and scientifically valid results. It is not entirely coincidental, that (to my knowledge) none of the live-versus-recorded tests to date have produced a single scientific publication or new psychoacoustic knowledge.

Hopefully, you now understand why I don’t conduct live-versus-recorded loudspeaker listening tests.

References
[1] Harvith, J., and Harvith, S. Edison, Musicians and the Phonograph: A Century in Retrospect?, Greenwood Press, N.Y (1987).

[2] Andre Milliard, “Edison’s Tone Tests and the Ideal of Perfect Sound Reproduction,” from Lost and Found Sounds’, NPR.

[3] Program for Edison Demonstration http://www.nipperhead.com/old/tonetest04.htm

[4] Wharfedale History: http://www.wharfedale.co.uk/About/History/tabid/66/Default.aspx

[5] Acoustic Research http://en.wikipedia.org/wiki/Acoustic_Research

[6] Edgar Villchur, http://edgarvillchur.com/

[7] Villchur, Edgar, “A Method of Testing Loudspeakers with Random Noise”, J. Audio Eng. Society, Vol. 10, Issue 4, pp, 306-309 (October 1962),

[8] Kissinger, John R."The Development of the Simulated Live-vs-Recorded Test into a Design Tool", presented at the 35th AES Convention, preprint 609, (October 1968)

[9] Olive, Sean E.; Schuck, Peter L.; Sally, Sharon L.; Bonneville, Marc E. “The Effects of Loudspeaker Placement on Listeners' Preference Ratings”,JAES Volume 42 Issue 9 pp. 651-669; September 1994.

terryj · Jul 10, 2010

Hooray, finally a forum where I might have to abstain from a nightcap before replying!

But, as I have had a nightcap, I might have to keep it simple.

So, HOW is it even conceivable that people could have thought those recorded versions were indistinguishable from reality?

Isn't it like the old movies, where people ran from the theatre because they thought they would be run over by the train??

honestly, is it credible that just because they were sighted tests people could still not distinguish the two? Would a hearing test help explain the 'results'? That the recording and live performance were not identical, does that really sway the results?

Those sort of objections are, surely, clutching at straws? In other words, from our current perspective there is no way the two could be confused.

So I tend toward the more mundane explanation...we only have the word of the exhibitors that the audience could not tell them apart!

(exit poll...crikey, they sounded bloody terrible!!)

official report..no difference was detected!

It's all in the spin hahah!

tonmeister2008 · Jul 10, 2010

terryj said:
Hooray, finally a forum where I might have to abstain from a nightcap before replying!

But, as I have had a nightcap, I might have to keep it simple.

So, HOW is it even conceivable that people could have thought those recorded versions were indistinguishable from reality?

Isn't it like the old movies, where people ran from the theatre because they thought they would be run over by the train??

honestly, is it credible that just because they were sighted tests people could still not distinguish the two? Would a hearing test help explain the 'results'? That the recording and live performance were not identical, does that really sway the results?

Those sort of objections are, surely, clutching at straws? In other words, from our current perspective there is no way the two could be confused.

So I tend toward the more mundane explanation...we only have the word of the exhibitors that the audience could not tell them apart!

(exit poll...crikey, they sounded bloody terrible!!)

official report..no difference was detected!

It's all in the spin hahah!

It is amazing that people were (apparently) fooled: the title of my story could have easily been "Why Live-vs-Recorded Listening Tests Do Work (in fooling people)

You mention people's initial reaction to running out of theaters because they were afraid of trains, which I believe was because a) it was a relatively new experience for them, and b) it was perhaps a herd reaction (one person runs, the others think the theatre is on fire,etc)

One thing that is clear is that the live-versus-recorded demonstrations such as Edison's were highly staged and highly biased. There is also no evidence that any data was ever gathered, so we don't really know how many people were fooled, and how many people weren't. In the end, the marketing spin-meisters and the newspaper reports were the only voices we heard from. So, you have every right to be cynical about these tests.

There is a good summary of the Tone Tests by Edison historian Andre Milliard here: http://www.npr.org/ramfiles/lnfsound/20001010.lnfsound.05.rmm

You need a Real Audio player to listen to it.

Gregadd · Jul 11, 2010

I get live vs recorded test all the time. Just go to a concert where the band plays a recording of there work while they take an intermission. No one is fooled.

Phelonious Ponk · Jul 11, 2010

Well, it's good to know that even Edison was seeking accuracy, disappointing to learn that there was such a fine line between him and P.T. Barnum.

P

FrantzM · Jul 11, 2010

Hi

Couldn't it be that there is more at play here than simple suggestion? Some notions that we take for granted are learned.. Perspective being one of them. For someone who has never heard sound being reproduced , anything that remotely plays music might have sounded mighty exraordinary .. I am certain that after a few auditions perspective may have changed ... Just pondering

LesAuber · Jul 11, 2010

Frantz may be onto something there. Even today unless it is extraordinarily bad it takes a while to pick up onto the shortcomings of a new medium when it is introduced. If all they'd heard previously was themselves singing around a player piano for instance there may have been no good point of reference.

tonmeister2008 · Jul 11, 2010

FrantzM said:
Hi

Couldn't it be that there is more at play here than simple suggestion? Some notions that we take for granted are learned.. Perspective being one of them. For someone who has never heard sound being reproduced , anything that remotely plays music might have sounded mighty exraordinary .. I am certain that after a few auditions perspective may have changed ... Just pondering

Some of the accounts I've read suggest the singer sometimes sang along with the recording and stopped at some point under sighted conditions. This type of demo could more easily trick people since the ventriloquist effect would work to Edison's benefit - The noises and distortions of the recording would be constant throughout the test, which would reduce their contribution as a factor. If the singer sang at a very low level, then it would be difficult to detect when they started and stopped. The test was not a "Is it live or recorded " test but rather a "can you tell when the singer stopped singing along". A much different and less sensitive test. It's clear that Edison had a pretty good understanding of human psychology and perception, and how to manipulate those perceptions.

Here is a passage about Perception from Wikipedia (sorry to be quoting from Wikipedia but I'm lazy like everyone else who uses it) that may be relevant. When people are unfamiliar with a new stimulus they will try to fit it into a category that best fits their experience. Perhaps when they heard the recording for the first time, their closest experience was a real voice. The fact that they were told by Edison that it was identical to the real voice surely help them arrive at the conclusion. As Edison said," People will hear what you tell them they hear." This is probably more true when they haven't heard it before or don't understand it.

".....The processes of perception routinely alter what humans see. When people view something with a preconceived concept about it, they tend to take those concepts and see them whether or not they are there. This problem stems from the fact that humans are unable to understand new information, without the inherent bias of their previous knowledge. A person’s knowledge creates his or her reality as much as the truth, because the human mind can only contemplate that to which it has been exposed. When objects are viewed without understanding, the mind will try to reach for something that it already recognizes, in order to process what it is viewing. That which most closely relates to the unfamiliar from our past experiences, makes up what we see when we look at things that we don’t comprehend.[5]

This confusing ambiguity of perception is exploited in human technologies such as camouflage, and also in biological mimicry, for example by Peacock butterflies, whose wings bear eye markings that birds respond to as though they were the eyes of a dangerous predator. Perceptual ambiguity is not restricted to vision. For example, recent touch perception research Robles-De-La-Torre & Hayward 2001 found that kinesthesia based haptic perception strongly relies on the forces experienced during touch.[6]..."

from http://en.wikipedia.org/wiki/Perception

Alrainbow · May 19, 2022

This is a great thread love this hope a massive attack dont start lol.
aside of how anyone was fooled back then I can share this a first hand account
at capital audiofest
they had a woman playing a harp
she played in halls and in rooms
now while this a nice thought
it sacred me in that I felt no room could get close to this side by side
and i heard her in many rooms
it was to me why are you doing this here !
she killed most all rooms
but one room I felt got it best in being her
this was done live her vinyl playing while she played the exact same song
a hint vpi table
claSS d amps by merrel
and Gary’s gensis audio speakers
now forgive me If I misspelled the amps
Gary looked scared as it started lol.
the amp guy I think it was him made it louder to better match her volume
to me that setup caught the essence of her live harp. It had her tonal balance
timbre and attack
it stayed right with her to me as it perfectly matched her sound
I’m not kidding or making this up it was to me.

Search

Search

Why Live-Versus-Recorded Listening Tests Do Not Work

tonmeister2008

WBF Technical Expert

terryj

New Member

tonmeister2008

WBF Technical Expert

Gregadd

WBF Founding Member

Phelonious Ponk

New Member

FrantzM

Member Sponsor & WBF Founding Member

LesAuber

Well-Known Member

tonmeister2008

WBF Technical Expert

Alrainbow

Well-Known Member

Similar threads