psychoacoustics behind great audio reproduction

No argument, John, just a few questions:

jkeny -
- Uli's Crosstalk which is frequency dependent seems to give a more relaxed listening experience. Why? Because the theory suggests that it gives us better localisation cues by correcting the slight misalignment that using strictly loudness based localisation causes.

How does adding crosstalk change the localization cues? Won't they still be loudness and timing based, but with the location of some frequencies moved? How does blurring the stereo image/moving information within the horizontal field change the timing? I get the volume part, as my understanding is that this induced crosstalk moves high frequency information away from the speaker and toward center, where it is more phantom, less direct. Wouldn't that make the crosstalk effect variable, based on the imaging properties of the system? Meaning that a system that images very well, that projects a particularly strong phantom images between the speakers would reduce volume less and get less of this effect?

This is an interesting anomaly which occurs because we tend to naturally localise both with ILD (loudness differences of the same signal arriving at our ears) & ITD (timing differences of the signal arriving at our ears). When ITD is taken away (not used in the recording process) then ILD becomes less than 100% effective at localisation & leads to some smearing of the location of the sound because just relying on ILD we psychoacoustically don't perceive all frequencies of an instruments signal as coming from the same place.

You lost me. Can you give me an example of when timing differences are not used in the recording process?

Correcting the psychoacoustic frequency variability of the loudness would seem to be worthwhile to addressing this. One way of slightly correctly shifting the frequency elements of the source signal is to introduce some crosstalk which varies with frequency.

How does it vary? Are specific percentages of crosstalk applied to specific frequency ranges?

This will probably only work for those recordings where studio panning is used & maybe not for the situation of a recording of a real venue done with a good microphone technique

That would be good news for the technique, as panning placement of instruments is nearly universal. Even classic jazz and classical recordings used/use panning. Mics may not be right on top of the instruments; there may not be isolation of instruments by track as there are in a lot of modern studio recordings, and there will be some "bleed" of instruments and sections into the mics of other instruments and sections, but ambient recordings, made from a listening position with a stereo pair of mics are very rare in professional recording.

That's a lot of questions. If you can just refer me to sources, I'd be happy to do my own research, but it appears that we are theorizing that by blurring the stereo image, we can get a better stereo image. That seems unlikely and I'd like to dig in deeper and see if there could be something else going on.

Tim
 
It's not dependent on speaker crossover, if that's what you mean?
No, I meant the crossover from the range with ITD dominance to ILD dominance.

Many references will be found to this statement if you search for "envelope ITD"
"At high frequencies, interaural time differences (ITDs) are conveyed by the sound envelope. Sensitivity to envelope ITDs depends crucially on the envelope shape."
Makes sense since the envelope is, essentially, an LF carrier for the high frequencies.
 
Last edited:
Two quotes from the link I posted on page one of this thread.

"Ambiophonics produces two improvements in the sound—spatial improvements and clarity/tonality improvements. The spatial improvements include the creation of a wider, deeper stage with better horizontal and depth imaging. The spatial performance is well understood. One can measure ILD and ITD cues recorded by studio microphones and mathematically predict where the listener will hear the image. One can also predict the change in image location when the ILD and ITD cues are distorted by crosstalk in conventional stereo. (See, for example, Glasgal’s Tonmeister Symposium paper, 2005.) Hence, it is well understood how crosstalk cancellation increases stage size and improves imaging. Perhaps more important than its spatial performance are Ambiophonic’s clarity/tonality improvements. Even when listening to a single instrument, the instrument will have better clarity and richer tonality after crosstalk cancellation. Instruments will sound more real compared to conventional stereo. Crosstalk cancellation reduces stereo’s unnatural four sound presentations to the ears to the two that we hear with live sound. "

"Ambiophonics can produce a “you-are-there” large ensemble experience rather than the cramped sense of “they are here” often delivered via the stereo triangle. It transports you to the recording site whereas conventional stereo seems to transport the instruments to your listening room. Listen to Ambiophonics long enough for your ear/brain to accommodate to the larger stage, improved imaging, greater clarity, and improved tonality. Listen long enough to get used to the sound. Then press the button that returns the sound to conventional stereo. The shock of the change will be like a slap in the face."

This is the difference between 2 channel stereo and enhanced stereo. Now I already have a passive circuit that produces this Ambiosonic profile or enhanced stereo. Maybe someone who is a lot smarter then me can research and build a similiar passive circuit.
 
No argument, John, just a few questions:

jkeny -

How does adding crosstalk change the localization cues? Won't they still be loudness and timing based, but with the location of some frequencies moved? How does blurring the stereo image/moving information within the horizontal field change the timing? I get the volume part, as my understanding is that this induced crosstalk moves high frequency information away from the speaker and toward center, where it is more phantom, less direct. Wouldn't that make the crosstalk effect variable, based on the imaging properties of the system? Meaning that a system that images very well, that projects a particularly strong phantom images between the speakers would reduce volume less and get less of this effect?
Best to quote Uli "in case of ILD we get a different localization of frequencies played by a constant ILD relationship. So lower frequencies tend to located more close to the center whereas high frequencies get more located closer to the speaker with the higher amplitude." So Uli's technique of adding variable crosstalk based on frequency is an attempt to correct this localisation issue.

You lost me. Can you give me an example of when timing differences are not used in the recording process?
Mostly in studios where panning is used to create false width. I did say that it is better to talk about more natural recording of an actual audio event rather than get caught up in studio techniques.

How does it vary? Are specific percentages of crosstalk applied to specific frequency ranges?
Yes, I believe so but that is something you need to ask Uli on the other thread on which you are also participating
That would be good news for the technique, as panning placement of instruments is nearly universal. Even classic jazz and classical recordings used/use panning. Mics may not be right on top of the instruments; there may not be isolation of instruments by track as there are in a lot of modern studio recordings, and there will be some "bleed" of instruments and sections into the mics of other instruments and sections, but ambient recordings, made from a listening position with a stereo pair of mics are very rare in professional recording.
Well you seem to have answered your own question above "You lost me. Can you give me an example of when timing differences are not used in the recording process?"

That's a lot of questions. If you can just refer me to sources, I'd be happy to do my own research, but it appears that we are theorizing that by blurring the stereo image, we can get a better stereo image. That seems unlikely and I'd like to dig in deeper and see if there could be something else going on.

Tim
Most of the answers are in Uli's thread, Tim or I'm sure he can give you references to follow up.
 
Just to clarify before too much confusion sets in - The goal of Uli's frequency dependent crosstalk addition is the same goal as the Ambisonics approach - an attempt to provide better perceptual localisation but the two approaches are diametrically opposite, from what I understand. Uli' uses added crosstalk & Ambisonics use crosstalk cancellation.
 
Psychoacoustics Conversation With Audiophile:

Me: How good or bad is the playback speaker/room interaction? That's THE question. Everything else is small stuff.

Audiophile: But, Michael, it's the small stuff that matters for me! I mean it's that last 1% that makes ALL the difference in my system. I mean that power cable changed everything!

Me: Have you ever measured your room's FR and time domain response? Do you own a calibrated microphone?

. . . .


A discerning audiophile will tell you the answer should be a little of both, as they are both relevant to optimum 2 channel sound reproduction.

And considering the final question, I never read an enthusiastic report with substance about musical performance from someone who owns a calibrated microphone and ignores the small stuff, and have read thousands of enlightening pages about how their systems sound with real and very demanding music from people who care about small stuff and do not own calibrated microphones. Surely IMMV.
 
This topic fascinates me & I believe that teasing out the psychoacoustic rules that determine great sound in our 2 channel reproduction chains is the next advance that audio reproduction has to address.

(...)

John,

It is really a fascinating subject and was partially addressed many times, but not systematically and with great controversy. Do you remember the debate on micro and macrodynamics, or on depth and height in the soundstage? ;)
 
John,

It is really a fascinating subject and was partially addressed many times, but not systematically and with great controversy. Do you remember the debate on micro and macrodynamics, or on depth and height in the soundstage? ;)
Ah, yes now that you have reminded me, I remember but maybe we can have another go without the agenda driven stuff? You may think I'm naive but I choose curiosity to cynicism every time - I find it keeps open the pathway to new learning?
 
Context

I guess the thing that bothers me is the context dropping.

Uli's way of doing frequency dependent crosstalk is a very small feature to software that is a mega powerful DSP tool. In the right hands, his Acourate yields major transformational changes to ANY system. The part of it upon which you perseverate is very minor and insignificant by comparison. I wish I never mentioned it.

Michael.
Sure, I'm not ignoring it, just not willing to overplay it either. So, it's not a show stopper for me. In other words I still think that significant improvements can be made to the rest of our systems independently to dealing with room treatments or room correction. I've been in rooms that have been treated & apart from sound studios I haven't noticed a big difference between then & untreated rooms. Maybe there is & I just haven't visited a very well done rooms with professionally designed room treatments. The sound studios I have visited have substantial treatments that wouldn't work aesthetically in a home environment so maybe I'm jut biased?
 
Tim --
That would be good news for the technique, as panning placement of instruments is nearly universal. Even classic jazz and classical recordings used/use panning. Mics may not be right on top of the instruments; there may not be isolation of instruments by track as there are in a lot of modern studio recordings, and there will be some "bleed" of instruments and sections into the mics of other instruments and sections, but ambient recordings, made from a listening position with a stereo pair of mics are very rare in professional recording.

John --
Well you seem to have answered your own question above "You lost me. Can you give me an example of when timing differences are not used in the recording process?"

If I've answered my own question, I've missed it. Stereo panning is related to what "timing information?"

Tim
 
I guess the thing that bothers me is the context dropping.

Uli's way of doing frequency dependent crosstalk is a very small feature to software that is a mega powerful DSP tool. In the right hands, his Acourate yields major transformational changes to ANY system. The part of it upon which you perseverate is very minor and insignificant by comparison. I wish I never mentioned it.

Michael.

Good point. Michael I use other speakers to enhance the 2 channel experience. Uli's method uses his multiple feature digital software to enhance stereo with just 2 primary speakers...R&L is that correct? Also have you been able to use all the features yet? I could ask a lot of questions,I really like this format.
 
I guess the thing that bothers me is the context dropping.

Uli's way of doing frequency dependent crosstalk is a very small feature to software that is a mega powerful DSP tool. In the right hands, his Acourate yields major transformational changes to ANY system. The part of it upon which you perseverate is very minor and insignificant by comparison. I wish I never mentioned it.

Michael.
Is what you are saying, Michael, that Uli's approach makes no sense without room treatments or DRC?
 
I have only scratched the surface with Acourate and I am already amazed. I will dive in much deeper soon.

I have a simple right/left full range stereo system with active crossover for 2 mono subs on front/ back walls to cancel axial modes/ nulls. I currently use a DEQX for the speaker correction above 1 kHz and the active crossovers.

I will have a Lynx Hilo here tomorrow. I will use it to setup a 4CH system as described above using only the FIR filters generated from Acourate. I am sure it will be the most challenging chew I've bitten off as a music listener. :). But I have a head start given that my delays/crossovers are already very good. I can also lean on Uli a bit too. It will be exciting when I'm done.


Good point. Michael I use other speakers to enhance the 2 channel experience. Uli's method uses his multiple feature digital software to enhance stereo with just 2 primary speakers...R&L is that correct? Also have you been able to use all the features yet? I could ask a lot of questions,I really like this format.
 
I have only scratched the surface with Acourate and I am already amazed. I will dive in much deeper soon.

I have a simple right/left full range stereo system with active crossover for 2 mono subs on front/ back walls to cancel axial modes/ nulls. I currently use a DEQX for the speaker correction above 1 kHz and the active crossovers.

I will have a Lynx Hilo here tomorrow. I will use it to setup a 4CH system as described above using only the FIR filters generated from Acourate. I am sure it will be the most challenging chew I've bitten off as a music listener. :). But I have a head start given that my delays/crossovers are already very good. I can also lean on Uli a bit too. It will be exciting when I'm done.

Great....I love experimenting! I think you're on the cutting edge:)
 
No. I am saying your excitement for it makes no sense in it's proper context.

OK, you mean that "flow" is such a small part of Acourate that I'm making more of it than I should?
I'm simply trying to investigate what might be interesting psychoacoustic techniques which we can learn from. It doesn't really matter to me where these techniques come from or what their context is unless they need the context to work.
I'm still not fully sure what your objection is?

Edit: Maybe I do see what your objection is. So let me try to explain - I'm looking at this maybe more globally than you - I'm looking at this to try to establish what might be universal, generally applicable psychoacoustic "rules" for better sound. You seem to be looking at it as simply part of a very powerful DSP tool that can improve the response in your room. I'm not so much interested in the specific frequency/timing adjustments/settings that make your room more listenable - it will be different for everybody's room. I'm more interested in understanding what might be universally applicable to every playback system
 
OK, you mean that "flow" is such a small part of Acourate that I'm making more of it than I should?
I'm simply trying to investigate what might be interesting psychoacoustic techniques which we can learn from. It doesn't really matter to me where these techniques come from or what their context is unless they need the context to work.
I'm still not fully sure what your objection is?

Have you even downloaded the free software from acourate to test it?
 
Have you even downloaded the free software from acourate to test it?

Not yet but I probably will, at some stage.
However, this thread is not about Acourate so I again don't see where you are coming from? Did you read the edit I added to my previous post? Maybe that explains my position better?
 
Maybe it makes some sense if I join the discussion. At least to give you some information about my intentions.

I'm working on room correction now for many years already. Simply driven by the idea to get a good playback WITHOUT turning my living room into a studio. I truly believe that in most cases music is not played in optimal environments but changing the environment is not possible. If you can built your own dedicated listening room then just DO it. If you can't then room correction by applying digital filters is the right way to go. Of course also a mix is allowed.

But is that the end of optimization? I have noticed the hot dicussions about the differences of analog and digital playback. How often is it argued that the band limitation of 16/44 is the drawback. And we must at least use 24/192 to grab all the musical details of a vinyl recording?
But I have also noticed that even a good setup including room correction can sound nasty or fatiguing. So what's behind ? That's what I'm trying to understand.

Of course the basic flaw of stereo playback is the crosstalk. It is studied and worked on for a long time now. We know that a crosstalk cancellation can improve the result but it also has drawbacks. How many stereo setups do you know or even use with speakers positioned close together (to optimize crosstalk cancellation) instead of the usual triangle?

Then I have come across publications by Sengpiel, see http://www.sengpielaudio.com/FrequenzabhHoerereignisrichtung.pdf and http://www.sengpielaudio.com/Shuffler.pdf. This is a topic independent of room correction. Indeed this topic is about SUBTLE but still important influences.
But dealing with digital filters of course also allows to play with introducing a frequency dependent crosstalk. That's why I have have got more and more interests on psychoacoustic aspects.

My personal actual model now uses an example from optics:
Imagine to look thru a pair of binoculars. Each ocular is perfect. We see 3D pretty well. Indeed we are lucky here, there is no crosstalk. No effort for the brain.
Now assume that each ocular has its own focus control. And now let's assume bad influences on the control. It may be influenced by noise, certain frequencies or jitter. Both focus controls may change sync'd or fully independent.
I believe everyone can easily image that the view thru the binoculars is not pleasing. The question is: how much of the distortions is allowed ? Big focus changes will immediately be unacceptable. But can you imagine that there is just a subtle distortion which is not noticed at first sight? Which will lead to fatigue without knowing why?

I hope you get the idea. I believe that's the same with listening by two ears (but not as obvious as the optical example). Our brain has to decode the sound. And if the sound contains a certain kind of distortions, different on each channel, the brain has to do more decoding work. I'm convinced that's the reason for annoyance and fatigue.

So I'm also motivated by the same idea that has lead John to start this thread.

Right now I have found two answers. One answer is to add some frequency dependent crosstalk. I'm not fully happy, it seems that we still do not know enough what's the best compensation curves are and how much amount we have to add. Btw it is difficult to measure. Subtleties are always difficult to grab. So what are the best test signals and so on.

A second answer I have found thinking about the equality of playback channels. Assuming the existence of data dependent jitter and knowing that the stereo information is different (otherwise it is not stereo) the question arises: how much do we recognize the unequal distortions of a DA conversion. So I have made a test by coding the L-R channels to M-S before DA conversion and decoding the analog signals back to L-R right after the conversion by an analog circuit. The idea is to share the distortions on both L-R channels.
I had the opportunity to start a test during a 3-day audio presentation with Acourate workshops. The audience did not know about the principles. So the people could only hear the sound with or without the "black box". After the show the box has got the name "cleaner" by the audience ! And indeed I have got a new product, the AcourateCleaner ;)

In the meantime I have got some more questions. E.g. there is aliasing. Aliasing means that frequencies get mirrored back into the audible frequency range. Again left and right channel contain different information, again we get different distortions. Can we perceive it?

At the end: is that all psychoacoustic? Or is is the question how much decoding effort is necessary?

- Uli
 

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu