Listening Room Intelligibility Test

Art Noxon

WBF Technical Expert (Room Acoustics)
Mar 29, 2011
38
1
0
Eugene, OR
www.acousticsciences.com
Hello all

I’m Art Noxon and I’m very pleased to be invited to work in the Expert position here and represent the audiophile version of room acoustics.

I believe I need to begin by apologizing in advance should I display anything but absolute neutrality towards all acoustic products and techniques. I will do my best to explain and support the appropriate application of any product or technique we are covering, but still, I do have my favorites…..

Where do we begin this adventure? I have selected a listening experience as the opening salvo. Audiophiles tend to want to listen first and talk second. It is the nature of the golden ear crowd. I will always assume I am addressing golden ears, even though there may be some technical people and collectors out there as well.

High end audio started for me in 1984, having just invented the TubeTrap. In the beginning I spent a lot of time trying to figure out why audiophiles got so excited about putting a pair of TubeTraps in the front corners of their listening room. Yes. I was pleased and proud, but being the hard core engineer that I am, I needed to measure, to quantify, what caused their thrill. I tried all the standard tests known then, which are the same ones known today; sine sweep, pink noise, RT-60s, narrow spectrum analysis, TEF waterfalls, and even Q changes of resonant modes.

What did I get? The best was about 1 dB adjustment in anything, usually less. Certainly not enough to warrant the pleasure those trapped corners gave the audiophile crowd. I published my findings (failure to find) in the AES and didn’t know what to do after that. In the mean time we had gotten so bored with reverb chamber testing of TubeTraps we tried to speed the process up and were getting very illuminating results using rapid short tone bursts. Not only in the chamber but in real rooms we were documenting changes upwards of 10 to 15 dB for rooms with and without TubeTraps in the corners.

Coincidently at this time, 1986, I was going to the SynAudCon meetings. Victor Peutz, from the Netherlands, showed up and presented the new/next wave of audio performance testing: Intelligibility. I couldn’t believe my ears. I realized right then and confirmed with him that our fast tone burst testing was actually a tonal intelligibility test. Much more has come to light since that first peek into the wonderful world of audio playback intelligibility, which turns out to be what really matters the most in hifi playback.

And so, here is the link to the MATT TEST for you to click on. Included here is a description of the MATT test, a training demo and a downloadable MP3. MATT means Musical Articulation Test Tones. This test is also on TRACK 19 of THE STEREOPHILE TEST CD 2 and many audiophiles have that reference CD. Do your own A/B test. Listen to the signal first on headphones and then listen to your audio system try to play it. We’ll talk about what you hear soon enough.

For now, report into the forum about what you heard……………………………… Arthur Noxon
 
Fascinating...+1. Thank you, all.
 
Please be careful when running the MATT test. You will quickly learn how really lousy your room may be :eek:

Seriously, it is very revealing. Art may be discussing it in the future, but there was/is a way to run the test and send ASC the data and they will return the results and you can then see how your room responds -- usually not a pretty site.

This should be interesting.
 
Even a very quick run-through was very interesting. I heard two basic elements -- the pulsing tone that rose and fell, and a background rhythmic click which pulsed separately by speeding up and slowing down. When I switched from phones to speakers, I lost some of the background click. It seemed more recessed and I seemed to lose some of the very fast clicks, the polyrhythms, if you will. Not much. Most of it was still there, but pushed a bit further into the background. Oddly enough, lowering the volume of the speakers seemed to help.

Tim
 
Tim is working hard and making excellent observations. Explaining what a sound sounds like is rough. No one wants to miss something subtle but obvious, and no one wants to read into a sound something they imagine hearing. Tim managed to open up pandora's sonic box in one night of listening. So, let's dive in and I'll bring you up to date, regarding his observations, as much as I know so far... However, you haven’t listened to the MATT test signal on headphones and then in your room, like Tim did, please listen before you read the body of this post. Listen for the ticking and then when the ticking just disappears even though the tone bursts still sound fine. That's the effect we are talking about right now. As you can imagine there is a backstory here and let me tell it to you before I get to the punch line.

It was 1992 and we were asked for a clean copy of the MATT test so it could be published in Stereophile Test CD 2. We had been using our original one for about 4 years by then. This was our DIY version, all analog. We used one oscillator to generate a slow sine sweep and ran that signal through a second oscillator that tuned the audio level on and off every 1/16th second. This left lots of vertical “on” attack transients and vertical “off” release transients in the signal. At the time it was all we could do. But still, we found ourselves having to apologize for that motorboating effect burned into our "dirty" test signal.

At that time recording studios were converting to digital. We contracted with a local studio to create a perfect sequenced tone burst test, using a Hamming Window waveform envelop shaping, which would open and close each tone burst at zero volts, which gets rid of the dirty part of the signal, the tapping transients. We hoped a digitally generated signal would sound professional instead of what we had cabled together, and as well, have a much quieter noise floor

We paid a lot for that signal and finally we sat down for a listen. It certainly sounded clinically proper, good clean tones, no transient tapping distractions, but just maybe a little too clean, too sterile. Something kept bothering us and we finally realized that the problem was with that tapping noise. It was the first part of the tone burst to get messed up, as the room begins to lose its intelligibility rating. Long before the tone bursts begin to slur and become garbled, the clarity of the tapping disappears. We had learned an expensive lesson. That embarrassing tapping part of the signal is actually most sensitive part of the intelligibility signal. And yet, we didn’t know how or why, or what to do with this idea, or if we were just imaging things….

Now we were in a double quandary. We were under pressure to get the track out because of the Stereophile deadline. Whatever we did it we knew it would be memorialized forever and we felt a sense of responsibility. We had a dirty, DIY signal that we knew was not a very academically respectable signal but it contained an early warning system, a precursor to impending poor intelligibility. The tapping was the sonic Canary in our mine shaft.

We finally decided that doing the right thing was better than looking like the right thing. We opted to forsake the opportunity to publish a respectably clean signal and instead just publish that dirty old signal. It still worked fine for what we were doing then, which was tone burst clarity, and in addition we believed it held a clue to a secret of sonic perception, something we could hear but could not understand. We had no further time to sort things out. We published our dirty signal with the tapping feature burned in. Our plan was that we, as a company or even individuals, might not make it but possibly the MATT test would live on and someone, somewhere would notice this odd precursor effect and do more work on our early discovery. It was our way to put a note in the bottle for someone in the distant future to find.

What is very interesting in retrospect is that when the ticking disappears, it is the “upper harmonics” in the attack transient that are disappearing as the intelligibility level drops. Intelligibility is the signal to noise ratio. It is how loud the sound is during the 1/16th second it is on compared to how quiet things get when the signal is turned off. It is measured in dBs, and ranges from zero dB, a gargling type of sound, to maybe 15 dB, a yodeling type of sound.

What we have since learned from psychoacoustic studies by others and the notable commercial success and failures of synthetic music generators is that the attack transient detail is very, critically important to the musicality quality of any note we hear. This detail is contained in the presence of and phase or timing alignment of upper partials of in the attack transient, and not in the sustain. Wait a minute…what did I just say?

As the dynamic intelligibility just begins to become compromised, the first thing that goes is the attack transient detail, which is the part of the signal that contains is the musicality component. First musicality becomes blurred and then dynamics become blurred as the intelligibility of the audio playback system degrades.

Wow. We didn’t know, when we decided to publish the dirty version of our musical intelligibility signal some 18 years ago, that we would manage to live long enough, to gather enough information from our customers and other researchers, so that we would be the ones who would finally figure out, about 2 to 3 years ago, and understand what was happening. That note in the bottle floated out there in the sea for 16 years. By the time we found and opened up that bottle again, we had learned enough and could finally understand what that note in the bottle meant.

Until now, I didn’t know that the note in the bottle was really a musical note. …………..Arthur Noxon
 
Last edited:
Tim is working hard and making excellent observations. Explaining what a sound sounds like is rough. No one wants to miss something subtle but obvious, and no one wants to read into a sound something they imagine hearing. Tim managed to open up pandora's sonic box in one night of listening. So, let's dive in and I'll bring you up to date, regarding his observations, as much as I know so far... However, you haven’t listened to the MATT test signal on headphones and then in your room, like Tim did, please listen before you read the body of this post. Listen for the ticking and then when the ticking just disappears even though the tone bursts still sound fine. That's the effect we are talking about right now. As you can imagine there is a backstory here and let me tell it to you before I get to the punch line.

It was 1992 and we were asked for a clean copy of the MATT test so it could be published in Stereophile Test CD 2. We had been using our original one for about 4 years by then. This was our DIY version, all analog. We used one oscillator to generate a slow sine sweep and ran that signal through a second oscillator that tuned the audio level on and off every 1/16th second. This left lots of vertical “on” attack transients and vertical “off” release transients in the signal. At the time it was all we could do. But still, we found ourselves having to apologize for that motorboating effect burned into our "dirty" test signal.

At that time recording studios were converting to digital. We contracted with a local studio to create a perfect sequenced tone burst test, using a Hamming Window waveform envelop shaping, which would open and close each tone burst at zero volts, which gets rid of the dirty part of the signal, the tapping transients. We hoped a digitally generated signal would sound professional instead of what we had cabled together, and as well, have a much quieter noise floor

We paid a lot for that signal and finally we sat down for a listen. It certainly sounded clinically proper, good clean tones, no transient tapping distractions, but just maybe a little too clean, too sterile. Something kept bothering us and we finally realized that the problem was with that tapping noise. It was the first part of the tone burst to get messed up, as the room begins to lose its intelligibility rating. Long before the tone bursts begin to slur and become garbled, the clarity of the tapping disappears. We had learned an expensive lesson. That embarrassing tapping part of the signal is actually most sensitive part of the intelligibility signal. And yet, we didn’t know how or why, or what to do with this idea, or if we were just imaging things….

Now we were in a double quandary. We were under pressure to get the track out because of the Stereophile deadline. Whatever we did it we knew it would be memorialized forever and we felt a sense of responsibility. We had a dirty, DIY signal that we knew was not a very academically respectable signal but it contained an early warning system, a precursor to impending poor intelligibility. The tapping was the sonic Canary in our mine shaft.

We finally decided that doing the right thing was better than looking like the right thing. We opted to forsake the opportunity to publish a respectably clean signal and instead just publish that dirty old signal. It still worked fine for what we were doing then, which was tone burst clarity, and in addition we believed it held a clue to a secret of sonic perception, something we could hear but could not understand. We had no further time to sort things out. We published our dirty signal with the tapping feature burned in. Our plan was that we, as a company or even individuals, might not make it but possibly the MATT test would live on and someone, somewhere would notice this odd precursor effect and do more work on our early discovery. It was our way to put a note in the bottle for someone in the distant future to find.

What is very interesting in retrospect is that when the ticking disappears, it is the “upper harmonics” in the attack transient that are disappearing as the intelligibility level drops. Intelligibility is the signal to noise ratio. It is how loud the sound is during the 1/16th second it is on compared to how quiet things get when the signal is turned off. It is measured in dBs, and ranges from zero dB, a gargling type of sound, to maybe 15 dB, a yodeling type of sound.

What we have since learned from psychoacoustic studies by others and the notable commercial success and failures of synthetic music generators is that the attack transient detail is very, critically important to the musicality quality of any note we hear. This detail is contained in the presence of and phase or timing alignment of upper partials of in the attack transient, and not in the sustain. Wait a minute…what did I just say?

As the dynamic intelligibility just begins to become compromised, the first thing that goes is the attack transient detail, which is the part of the signal that contains is the musicality component. First musicality becomes blurred and then dynamics become blurred as the intelligibility of the audio playback system degrades.

Wow. We didn’t know, when we decided to publish the dirty version of our musical intelligibility signal some 18 years ago, that we would manage to live long enough, to gather enough information from our customers and other researchers, so that we would be the ones who would finally figure out, about 2 to 3 years ago, and understand what was happening. That note in the bottle floated out there in the sea for 16 years. By the time we found and opened up that bottle again, we had learned enough and could finally understand what that note in the bottle meant.

Until now, I didn’t know that the note in the bottle was really a musical note. …………..Arthur Noxon

Art, that was just excellent. Just one thing I don't quite understand -- Gargling and yodeling? I'm not hearing any of that. I'm hearing the pulse tone, which doesn't seem to lose anything in the transition from phones to speakers, and the ticking, which just seems to recede a bit when I switch to speakers. No gargle. No yodel.

Tim
 
This is very fascinating stuff: I keep talking about using cymbals and such for testing system performance, but this sounds very much like it encapsulates a test mechanism for highlighting the behaviours I keep talking about. In simple terms, a normal system will have its dynamic intelligibility compromised as you increase the volume, one that is functioning at peak level, as Roger terms it, will not lose the dynamic intelligibility up to the limit of the amplifier's clipping limits. As an aside, I have not listened to the waveform but understand exactly the significance and importance of the attack transient detail being rendered clearly.

It seemed more recessed and I seemed to lose some of the very fast clicks, the polyrhythms, if you will. Not much. Most of it was still there, but pushed a bit further into the background. Oddly enough, lowering the volume of the speakers seemed to help.
Tim gave the game away so to speak: this is a perfect description of how normal systems start to fail to reproduce correctly, with the most critical phrase of all being "lowering the volume of the speakers seemed to help". This is a key indicator of where weaknesses still are in a setup; as you increase the volume the problems become more audible because the system is becoming more susceptible to those weaknesses, and cause the auditory illusion to fail.

Frank
 
Trust me, Frank, I have 325 watts rms per channel pointed at my head from 4 feet away. My amplifiers are not clipping. If they were, I'd know it.

Tim
 
Trust me, Frank, I have 325 watts rms per channel pointed at my head from 4 feet away. My amplifiers are not clipping. If they were, I'd know it.

Tim
Tim, this has nothing to do with clipping, this is all about system "misbehaviour" often well, well down from anything near the amplifier's theoretical maximum volume. In my early experiments with the Perreaux, a 200 watts per channel unit, distortion figures a high end tube amp would die for, it was very easy to hear where this big fellow started to fall over, way short of clipping; a dealer pushed a GAS amp for me to try, another big bruiser, and it was truly appalling. It could get barely above a whisper before you could hear these "distortion" components ...

Frank
 
Yeah, whatever, Frank. We were talking about room acoustics, in case you wanted to know, though I'm sure your soldering iron is the universal panacea for those issues too.

Tim
 
If you don’t hear dynamic slur when the MATT Test signal is played over your speakers, you have a pretty good, pretty fast room. If you record what you hear and send it to ASC, we’ll process the signal and send you the results.

All my comments are based on what I know to be typical listening room footprints, where the listener is back from the speaker maybe 10'. But if your speakers are 4' away, you are listening inside the nearfield and your signal to room ratio is huge. In my experience we call this tiny listening footprint an example of an audiphile "hiding from the room" or "huddling inside the room". If one has refined taste and can't stand listening while set back 10', you'll scoot forward to 8', and it gets better. But if it is still not good enough, you'll scoot even closer, and closer until you do get the right signal to noise ratio. You are down to 4'? Sorta like loosly wearing a giant pair of Stax headphones? Sorry, couldn't help it...

Lower volumes do not excite flexible surfaces strongly because the power delivered equals the power being attenuated. Their natural friction keeps them in the linear response area, and they are not over driven. Loud music injects disproportionately more distortion into the flexible surfaces of the room, more than the natural damping coefficients can deal with and these surfaces go into uncontrolled resonance. They act as if they were extra loudspeakers in the room, wired into some random reverb circuit, and musical dynamics are no longer faithfully followed. Most audiophiles know or at least have a feeling for how loud they can drive their room, above which the room begins to mechanically breakup.

I think there is something else working here, recall or look up the Fletcher Munson Curve. It illustrates how our ability to not hear bass is greater than with treble as the overall sound power is reduced. This means as the volume goes down we hear disproportionately less bass. This means unintelligibility due to uncontrolled bass behavior in the room is not perceived to be as loud as the corresponding upper partials of attack transients, and therefore the room sounds more musically accurate. ...…………… Art
 
Last edited:
Yeah, whatever, Frank. We were talking about room acoustics, in case you wanted to know, though I'm sure your soldering iron is the universal panacea for those issues too.

Tim
Sorry to push this point, Tim, but I'm talking about something that goes beyond room acoustics. You mentioned "I seemed to lose some of the very fast clicks, the polyrhythms"; if that was purely a room behaviour thing you should start to pick them up again the closer you move your head to your speakers, the direct sound from the drivers would dominate any effects because of reflected sound. Is that the case?

Frank
 
If you don’t hear dynamic slur when the MATT Test signal is played over your speakers, you have a pretty good, pretty fast room. If you record what you hear and send it to ASC, we’ll process the signal and send you the results.

Lower volumes do not excite flexible surfaces strongly because the power delivered equals the power being attenuated. Their natural friction keeps them in the linear response area, and they are not over driven. Loud music injects disproportionately more distortion into the flexible surfaces of the room, more than the natural damping coefficients can deal with and these surfaces go into uncontrolled resonance. They act as if they were extra loudspeakers in the room, wired into some random reverb circuit, and musical dynamics are no longer faithfully followed. Most audiophiles know or at least have a feeling for how loud they can drive their room, above which the room begins to mechanically breakup.

I think there is something else working here, recall or look up the Fletcher Munson Curve. It illustrates how our ability to not hear bass is greater than with treble as the overall sound power is reduced. This means as the volume goes down we hear disproportionately less bass. This means unintelligibility due to uncontrolled bass behavior in the room is not perceived to be as loud as the corresponding upper partials of attack transients, and therefore the room sounds more musically accurate. ...…………… Art

This is pretty cool stuff, Art. I'm not alarmed by the results I got; the taps just seemed to lose a very small amount of volume moving from headphones to speakers. As I got the volume up high enough in my small room, quickest taps did begin to blur ever so slightly, but that's to be expected. Still, all of that was from a very brief listen. I have the house to myself in the morning. I'll run it a few more times.

Tim
 
Well, it seems like we have one eager experimenter in the crowd….....

Tim, if you want to hear what the rest of the world listens to, most of the time, step away from your speakers and tell us what you hear…... Most speakers radiate sound in an omni pattern throughout the MATT frequency range, 28 thru 780 Hz. That's why it shouldn't matter much at all where you stand or sit to audition the intelligibility of your system at that location. This part of the test is not about precision stereo listening. It is very simple. It’s about volume speed control.

Right now, in your dialed in listening position, you feel you have 100% intelligibility. You admit to no hint of garbled sound while listening set back 4’ from your speakers. Set the test on repeat. Get up and walk away from your system, go down the hall, maybe into the kitchen. Get away from your speakers and the farther you get away, probably the more garbled they sound. See if you can find a location in your house where you hear what you call 100% totally garbled sound. Now find the position where you hear 50% garbled sound. That’s how most high performance audio systems sound.

At this stage of our adventure we are trying to listen to fundamental bass and midrange performance detail. We are trying to hear and understand something more than how loud our system plays a continuous sustained sound. We now are trying to listen to the speed of our system. We want to know how quickly it responds. A sluggish system does not respond to rapid dynamic level changes. We want to know how sluggish our system is and at what frequency, if we can hear it and can we measure it. We want a sluggishness test, even if we are afraid of the results.

What if the signal rises in sound level by 16 dB in 1/20th second and at our listening chair we only hear a signal level shift of 4 dB in 1/20th second, we’ve got something to worry about. This kind of diminished response to rapid level change equals dynamic compression. Someone hooked a dynamic level compressor into our system. Of course, if we rotate the volume dial between zero and eleven, we get full range volume change. That is steady state volume level change and it works fine. We are going to try to not worry about steady state performance right now. We are worried right now about dynamic performance.

There is a stereo imaging aspect to using the MATT test as well as an attack transient aspect. We are still working on the crudest part of the test, the ability of an audio system to fast track rapid dynamic level changes. We are still trying to learn about what the presence of or lack of dynamic clarity sounds like. It is the garbling factor in the playback system (electronics + room = playback system) that we are still trying to audition.

We need to be careful to keep each aspect of listening; level, level speed, attack, image position and focus, separate as we explore our system and its performance in these separate areas of perception. Notice we are not yet even talking about the pedestrian version of acoustics: wall reflections, modes and reverberation.

We want to develop tactile experience and a vocabulary to communicate about these experiences. Once we can do this, then we get to connect those experiences to our listening to music experiences. Presently, we are no where near done, we are just beginning to scratch the surface of a great and beautiful world, one that for most is always invisible, even though is surrounds and supports our every living minute. ………………Arthur Noxon
 
Thanks, Art. This is extremely cool stuff. I did what you suggested, walking out of the room, and once the sound had to go round 2 corners, it got really garbled. In the room, I hear the fast clicks and the tones (I cheated - I looked at the waveform). But once I left the room, the further away I was, the more garbled it became, and the clicks blurred in together with the tones, and some of the clicks were completely missing.

What was interesting to me was that in room, the volume varied up and down with frequency (like it does with the usual frequency sweep test), but the frequencies at which it went up and down changed with volume (unlike the usual frequency sweep tests). Outside the room, the louder frequencies and softer frequencies are different.

I'll try and record and post the results, but I've just discovered that while I have measurement instruments and ADC's I don't have any thing that I can record a microphone feed with. Looks like I'll have to go to the local Guitar Center and buy a microphone preamp.
 
Tim, if you want to hear what the rest of the world listens to, most of the time, step away from your speakers and tell us what you hear

I did better than that, Art. Yesterday I hauled the system (actives fed by a laptop make this pretty easy) out into the den and set the speakers up on stands. It's not a good room -- hardwood floors with no rug, wood-paneled walls, partially open to the adjacent kitchen, a big brick fireplace between the speakers and windows behind them. There, the seat is about 10 - 12 feet from the speakers. In that room, I lost a lot of the taps and some of the tones became smeared and indistinct. I went back to my near field set up pretty quickly. This comes as no surprise. I've had the system in that room before, and back in the near field set-up there is always more clarity, detail and the imaging is a lot more locked-in.

Tim
 
Last edited:
There is a stereo imaging aspect to using the MATT test as well as an attack transient aspect. We are still working on the crudest part of the test, the ability of an audio system to fast track rapid dynamic level changes. We are still trying to learn about what the presence of or lack of dynamic clarity sounds like. It is the garbling factor in the playback system (electronics + room = playback system) that we are still trying to audition.
I was pleased to see you mention electronics in the equation representing elements contributing to the "garbling factor", as you put it. As members of this forum are (too!) much aware this is the factor I have spent some time focusing on, and my own experience is that if you take care as much as you can of the electronics component contributing to the "blurring", then the room aspects are greatly diminished in importance.

All of your terms and phrases for conveying the behaviour of typical systems and where improvements should be looked for are extremely useful: "Get away from your speakers and the farther you get away, probably the more garbled they sound.", "A sluggish system does not respond to rapid dynamic level changes", "This kind of diminished response to rapid level change equals dynamic compression. Someone hooked a dynamic level compressor into our system.", "the ability of an audio system to fast track rapid dynamic level changes" are exactly the characteristics I have noted about conventional setups.

You mention "We are still trying to learn about what the presence of or lack of dynamic clarity sounds like.". What I am trying to convey to people here is how the "presence" of that dynamic quality comes across in a home setting.

we are just beginning to scratch the surface of a great and beautiful world, one that for most is always invisible, even though is surrounds and supports our every living minute
Right on the money! Thank you for bringing this test forward as a clear and unambiguous tool ...

Frank
 

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu