Objectivists - what might be wrong with this label/viewpoint!!

Are you not holding up the human ability to filter audio perception and concentrate on one thing out of many as evidence that there is something in the waveform that is not measured?
Not in the way you state it, no. Let me try again.

We perceptually ascertain audio objects in what we hear by the brain processing that we perform on the signal. The perception of these audio objects occurs because we seem to cross correlate particular signal markers which we associate with that particular audio object - spatial location, timbre, temporal coherence, & amplitude all seem to play a role - let's call these our perceptual rules for identifying this object.

Now, as we are following this audio object dynamically during the audio playback we construct an audio stream. We seem to temporarily lose this audio streaming if one or more of these signal markers strays outside of our perceptual rules for this object. This is demonstrated in the paper I linked to "Effects of self-motion on auditory scene analysis" which shows that a head movement can temporarily collapse our perception of two streams into one, momentarily.

Now this gives rise to some questions for me (maybe I'm being too simplistic in my thinking?) but what if instead of a head movement, we keep the head steady but the audio playback stream itself has a similar momentary change in one of these signal markers - in this case a temporary change in the temporal coherence of the signal marker. Would this cause a temporary resetting of our perceptual streaming - the research would indicate that it would.

How would this be perceived - an uneasiness with the sound, a less realistic portrayal of the sound stage, a playback that seemed less relaxing, more tiresome than one where this temporal issue wasn't happening?

If you are following me, so far, then I'm interested in how such a blip in the temporal coherence would be picked up in measurements, considering that such an event may happen only sporadically during the entire playback. FFTs won't suffice as the correct measurement tool as they are not run on a full playback, AFAIK & anyway require that such an issue was a repeating pattern so that FFT could amplify it above the noise floor. What other means of measurement will reveal such an event given that it such a low level change?.
If I've got that much right, here's a clarification: I believe all of the information the brain needs to perceive "oboe" is there, and that the ability to focus on that particular instrument is all in the brain. No unmeasured, undiscovered data required. Frequency, amplitude...all the usual, measurable stuff is enough for the brain to work with.

Clearer?
Yes, it's not a problem for the brains perceptual engine but I'm suggesting that it is a problem to see this in measurements - is that clearer?
 
Not in the way you state it, no. Let me try again.

We perceptually ascertain audio objects in what we hear by the brain processing that we perform on the signal. The perception of these audio objects occurs because we seem to cross correlate particular signal markers which we associate with that particular audio object - spatial location, timbre, temporal coherence, & amplitude all seem to play a role - let's call these our perceptual rules for identifying this object.

Now, as we are following this audio object dynamically during the audio playback we construct an audio stream. We seem to temporarily lose this audio streaming if one or more of these signal markers strays outside of our perceptual rules for this object. This is demonstrated in the paper I linked to "Effects of self-motion on auditory scene analysis" which shows that a head movement can temporarily collapse our perception of two streams into one, momentarily.

Now this gives rise to some questions for me (maybe I'm being too simplistic in my thinking?) but what if instead of a head movement, we keep the head steady but the audio playback stream itself has a similar momentary change in one of these signal markers - in this case a temporary change in the temporal coherence of the signal marker. Would this cause a temporary resetting of our perceptual streaming - the research would indicate that it would.

How would this be perceived - an uneasiness with the sound, a less realistic portrayal of the sound stage, a playback that seemed less relaxing, more tiresome than one where this temporal issue wasn't happening?

If you are following me, so far, then I'm interested in how such a blip in the temporal coherence would be picked up in measurements, considering that such an event may happen only sporadically during the entire playback. FFTs won't suffice as the correct measurement tool as they are not run on a full playback, AFAIK & anyway require that such an issue was a repeating pattern so that FFT could amplify it above the noise floor. What other means of measurement will reveal such an event given that it such a low level change?.
Yes, it's not a problem for the brains perceptual engine but I'm suggesting that it is a problem to see this in measurements - is that clearer?

John


You are fully aware that such is the realm of emotions. You may simply hate this given gear .. I have seen people touting how great something sounded , for example thinking that it was analog when in fact int was digital .. The second they are in the knowledge of the source in that case digital suddenly a wave of perception, often negatives ashes every pror perceptions away ... The stimulus didn't change ,our reaction to it did.
Notice that you use all terms that ar sientific thus based on repeatable , measurable aspects of reality. Temporal coherence is such whatever it means ..if it happens sporadically and we perceive why wouldn't a mike perceive itas well? Not saying that we know how to measure it all but if the phenomenon ois repeatable and most biases are eliminated ( a tall order Iwill admit) then why can;t it be measured.

Anyway if your point is to prove that it there are aspects we can hear but can't measure, you need to come up with better examples..
 
John

Happy Holidays ...

I don't get your point. The oboe or whatever instruments has a known harmonics content, its timbre. We focus and hear it, if it were a software, exact same thing. it would focus on the known harmonics content and follow it .. That is what is done in Karakoe software to remove the voice and allow you to put yours or mine.. If you don't know what an oboe or a human voice is composed of or for a human, sounds like you can not distinguish it from the rest of the orchestra (Noise) ..Same with the software

If the note has a physical reality it will be heard by a mike or a software. BTW the same note doesn't mean the same harmonic content ... This not that hard to do ...
I think you are being too simplistic in your view. To get an idea of auditory scene analysis try the demos here
 
Nope John If the physical stimulus provide us with a perception repeating it .. recording is one of these is an act of getting the data storing it and reproducing .. it is a measurement... Seems to be making my case ...

Different strokes and all that ...
 
John


You are fully aware that such is the realm of emotions. You may simply hate this given gear .. I have seen people touting how great something sounded , for example thinking that it was analog when in fact int was digital .. The second they are in the knowledge of the source in that case digital suddenly a wave of perception, often negatives ashes every pror perceptions away ... The stimulus didn't change ,our reaction to it did.
No, Frantz, this is an active area of research - my questions about how such auditory perception issues are perceived may well cross over into what you claim can be confused with emotional responses to audio playback but I was just surmising as to how the playback of such audio might be perceived (an audio piece that invoked such audio stream collapsing)
Notice that you use all terms that ar sientific thus based on repeatable , measurable aspects of reality. Temporal coherence is such whatever it means ..if it happens sporadically and we perceive why wouldn't a mike perceive itas well? Not saying that we know how to measure it all but if the phenomenon ois repeatable and most biases are eliminated ( a tall order Iwill admit) then why can;t it be measured.
I'm not saying that the mike won't pick it up - I'm saying that it might happen sporadically & at such a low level that we need a different approach to measuring it - one that can analyse a full recording & check it for temporal coherence (btw, you'll find temporal coherence demonstrated & explained in that demo)

Anyway if your point is to prove that it there are aspects we can hear but can't measure, you need to come up with better examples..
That is not exactly what I'm saying,- it's expanded on in my reply to Tim above.
 
(...) Anyway if your point is to prove that it there are aspects we can hear but can't measure, you need to come up with better examples..

Frantz,

IMHO the main question is that a measurement must correlate significantly with some aspect of sound quality. Surely no two IC cables will return you similar measurements - we all know that. But currently you can not anticipate all their sound properties based only in measurements.
 
Frantz,

IMHO the main question is that a measurement must correlate significantly with some aspect of sound quality. Surely no two IC cables will return you similar measurements - we all know that. But currently you can not anticipate all their sound properties based only in measurements.

Fair enough and we have no qualms there on the bold-ed above. As for what constitutes the threshold of "significance"we can have a meaningful debate.
One question Do we agree that we cannot perceive a change of 0.001 dB? If yes then we can go on, else we may have to drop it ...

Also I would agree with your last sentence with some slight modifications if you allow me ...

Currently we cannot anticipate all their sound properties with the usual set of current measurements ...
 
Not in the way you state it, no. Let me try again.

We perceptually ascertain audio objects in what we hear by the brain processing that we perform on the signal. The perception of these audio objects occurs because we seem to cross correlate particular signal markers which we associate with that particular audio object - spatial location, timbre, temporal coherence, & amplitude all seem to play a role - let's call these our perceptual rules for identifying this object.

Now, as we are following this audio object dynamically during the audio playback we construct an audio stream. We seem to temporarily lose this audio streaming if one or more of these signal markers strays outside of our perceptual rules for this object. This is demonstrated in the paper I linked to "Effects of self-motion on auditory scene analysis" which shows that a head movement can temporarily collapse our perception of two streams into one, momentarily.

Now this gives rise to some questions for me (maybe I'm being too simplistic in my thinking?) but what if instead of a head movement, we keep the head steady but the audio playback stream itself has a similar momentary change in one of these signal markers - in this case a temporary change in the temporal coherence of the signal marker. Would this cause a temporary resetting of our perceptual streaming - the research would indicate that it would.

How would this be perceived - an uneasiness with the sound, a less realistic portrayal of the sound stage, a playback that seemed less relaxing, more tiresome than one where this temporal issue wasn't happening?

If you are following me, so far, then I'm interested in how such a blip in the temporal coherence would be picked up in measurements, considering that such an event may happen only sporadically during the entire playback. FFTs won't suffice as the correct measurement tool as they are not run on a full playback, AFAIK & anyway require that such an issue was a repeating pattern so that FFT could amplify it above the noise floor. What other means of measurement will reveal such an event given that it such a low level change?.
Yes, it's not a problem for the brains perceptual engine but I'm suggesting that it is a problem to see this in measurements - is that clearer?

I think I'm following you, though I'll admit I'm in pretty deep. Two wave forms are coherent if they have constant phase differences at the same frequency, yes? What is immeasurable there? Do we need to imagine some property that is unknown and immeasurable as responsible? I don't think we do.

And I just don't think it is as fragile as you seem to think it is. I have, all of my adult life, been able to not only to focus on and perceptually isolate one instrument or voice, but several. When learning harmony parts from recordings, I hear the chord -- two or more voices or instruments together. I pull them forward, perceptually, from the rest of the music, to analyze them individually and together, in relationship to each other. Do it all the time. So do millions of other singers who learn music by ear. And I've done it on some painfully unsophisticated audio equipment. I just don't think there's any mystery or magic here...well, let me rephrase. I don't think there's any mystery or magical, undiscovered content in the waveform. I think the magic, and the mystery, is in the human brain.

Now, if you can figure out how the human brain pulls those perceptual tricks and build magic into your DACs that makes that, the organic audio processor, perceive a more natural sound...well, you're going to be a very rich man. Can I buy in before you go public? :)

Tim
 
I think I'm following you, though I'll admit I'm in pretty deep. Two wave forms are coherent if they have constant phase differences at the same frequency, yes? What is immeasurable there? Do we need to imagine some property that is unknown and immeasurable as responsible? I don't think we do.
I believe temporal coherence is explained in that demo I linked to - did you visit it?

And I just don't think it is as fragile as you seem to think it is.
The other research paper I linked to disagrees with you - head movement causes a collapse of the perception of two audio streams into one, momentarily & then resets to the new perceptual cues & resumes the two stream perception
I have, all of my adult life, been able to not only to focus on and perceptually isolate one instrument or voice, but several. When learning harmony parts from recordings, I hear the chord -- two or more voices or instruments together. I pull them forward, perceptually, from the rest of the music, to analyze them individually and together, in relationship to each other. Do it all the time. So do millions of other singers who learn music by ear. And I've done it on some painfully unsophisticated audio equipment.
Yes, an auditory object does not have to be a single instrument - it can be a collection of instruments that are segregated together i.e the string section
I just don't think there's any mystery or magic here...well, let me rephrase. I don't think there's any mystery or magical, undiscovered content in the waveform. I think the magic, and the mystery, is in the human brain.
Absolutely, the magic is in the brain & by knowing more about this magic we could use this new information to inform our measurements, develop new measurements & hopefully develop more realistic playback systems.

Now, if you can figure out how the human brain pulls those perceptual tricks and build magic into your DACs that makes that, the organic audio processor, perceive a more natural sound...well, you're going to be a very rich man. Can I buy in before you go public? :)

Tim
In my experience these matters only progress incrementally. I've recently heard DSD playback through a Lampizator DAC & it's realism was far superior to the same file playback in PCM. I bet that an analysis of it's analogue output using existing standard measurements would NOT reveal any difference between the two outputs.

BTW, thank you for sticking with this & being patient with my stumbling attempts at trying to describe my thoughts :)
 
Huh??? I don't know what you are talking about.

I guess so. Logic dictates that we need to test audio equipment in accordance with the task it performs. Since your DAC doesn't measure & isolate the 2nd violin from the string section & follow it through the playing of the piece, why would objective tests need to measure its ability to do that?
 
I believe temporal coherence is explained in that demo I linked to - did you visit it?

It was a pretty simple question. Do I really have to wade through a demo to get a yes or no?

The other research paper I linked to disagrees with you - head movement causes a collapse of the perception of two audio streams into one, momentarily & then resets to the new perceptual cues & resumes the two stream perception Yes, an auditory object does not have to be a single instrument - it can be a collection of instruments that are segregated together i.e the string section Absolutely, the magic is in the brain & by knowing more about this magic we could use this new information to inform our measurements, develop new measurements & hopefully develop more realistic playback systems.

Maybe I recalibrate quicker than the people in that study, I don't know. What I do know is that I listen in the near field. A head movement is more critical in that set up than in most. And yes, I feel things shift when I move. I don't perceive two audio streams collapsing into one, but the imaging, and particularly the phantom center moves. I adapt pretty quickly. I'm not sure how a deeper understanding of the processing of audio in the brain is going to lead to more/different/better measurements of the waveforms outside of the ears, but perhaps, given a better understanding of what the brain is doing, you can manipulate the waveform to stimulate a desired response. Good luck.

In my experience these matters only progress incrementally. I've recently heard DSD playback through a Lampizator DAC & it's realism was far superior to the same file playback in PCM. I bet that an analysis of it's analogue output using existing standard measurements would NOT reveal any difference between the two outputs.

I wouldn't bet against you on that one.

BTW, thank you for sticking with this & being patient with my stumbling attempts at trying to describe my thoughts :)

No problem. I'm sure I'm not being perfectly clear either. We're talking about stuff that just doesn't fit in the box, if you get my meaning.

Tim
 
It was a pretty simple question. Do I really have to wade through a demo to get a yes or no?
Sure it was a simple question but to give you a proper answer would entail a lot of explanation (look at my last answer to your equally simple question :). "The general idea is that
sound is analyzed through a multitude of parallel neural channels, each expressing various attributes of sound (periodicity, spatial location, temporal and spectral modulations, etc.). The problem of ASA is then to bind a sub-set of those channels together, with the aim that all channels dominated by a given acoustic source will be bound together and, if possible, not bound with channels dominated by other sources. The suggested principle is temporal coherence between channels (as measured by correlation over relatively long time windows). Coherent channels are grouped as a single stream, whereas low coherence indicates more than one stream" Any clearer? This text taken from here

"ASA truly
behaves as if it were an inference process relying on a variety of
sensory cues. These cues are evaluated from the proximal acoustic
wave and concomitant neural activity, but they are also weighted
with respect to their physical plausibility by means of a form of
embodied knowledge (not necessarily explicit and not necessarily
operating in a top-down manner) of some of the laws of the
acoustics of sound sources."

So, all our perceptions (not just auditory) are really a guessing game "Perception is an active construct, more akin to a moment-by moment gambling process than to a rolling camera or open microphone."
 
Sure it was a simple question but to give you a proper answer would entail a lot of explanation (look at my last answer to your equally simple question :). "The general idea is that
sound is analyzed through a multitude of parallel neural channels, each expressing various attributes of sound (periodicity, spatial location, temporal and spectral modulations, etc.). The problem of ASA is then to bind a sub-set of those channels together, with the aim that all channels dominated by a given acoustic source will be bound together and, if possible, not bound with channels dominated by other sources. The suggested principle is temporal coherence between channels (as measured by correlation over relatively long time windows). Coherent channels are grouped as a single stream, whereas low coherence indicates more than one stream" Any clearer? This text taken from here

"ASA truly
behaves as if it were an inference process relying on a variety of
sensory cues. These cues are evaluated from the proximal acoustic
wave and concomitant neural activity, but they are also weighted
with respect to their physical plausibility by means of a form of
embodied knowledge (not necessarily explicit and not necessarily
operating in a top-down manner) of some of the laws of the
acoustics of sound sources."

So, all our perceptions (not just auditory) are really a guessing game "Perception is an active construct, more akin to a moment-by moment gambling process than to a rolling camera or open microphone."

Yep. But I still think your undiscovered "measurements" are going to be found in the head, not the waveform. And if I'm right, you're going to need some different instruments to measure with.

Tim
 
Sure it was a simple question but to give you a proper answer would entail a lot of explanation (look at my last answer to your equally simple question :). "The general idea is that
sound is analyzed through a multitude of parallel neural channels, each expressing various attributes of sound (periodicity, spatial location, temporal and spectral modulations, etc.). The problem of ASA is then to bind a sub-set of those channels together, with the aim that all channels dominated by a given acoustic source will be bound together and, if possible, not bound with channels dominated by other sources. The suggested principle is temporal coherence between channels (as measured by correlation over relatively long time windows). Coherent channels are grouped as a single stream, whereas low coherence indicates more than one stream" Any clearer? This text taken from here

"ASA truly
behaves as if it were an inference process relying on a variety of
sensory cues. These cues are evaluated from the proximal acoustic
wave and concomitant neural activity, but they are also weighted
with respect to their physical plausibility by means of a form of
embodied knowledge (not necessarily explicit and not necessarily
operating in a top-down manner) of some of the laws of the
acoustics of sound sources."

So, all our perceptions (not just auditory) are really a guessing game "Perception is an active construct, more akin to a moment-by moment gambling process than to a rolling camera or open microphone."

I wouldn't take that info and then say perception is really a guessing game. An active construct is one thing. It isn't a camera snapshot or a mic input, but all the construct comes from input from the camera and the mike stimulating the construct. Those inputs have a relationship with what is likely to being going on out there in the real world.

And like Tim put it, those are all going on inside the head, not with the waveform. And the waveform is the source of what gets constructed. It also explains why non-perceived effects in the brain will effect what is perceived. But that is all after the basic input from the ear which is a much simpler process at this point. None of this calls into question the basic idea that objective information about sound is in error or missing something important that the ear can pick up on.
 
I wouldn't take that info and then say perception is really a guessing game. An active construct is one thing. It isn't a camera snapshot or a mic input, but all the construct comes from input from the camera and the mike stimulating the construct. Those inputs have a relationship with what is likely to being going on out there in the real world.
Well I think logic & the experts disagree with you as do I. Here's an explanation that might convince you?
All the pressure
waves from different sound sources combine linearly in the air. As a result,
any waveform observed at the ears may have been caused by one, two, or
in fact an unknown number of sound sources. Determining the number and
nature of sound sources from the compound waveform is an ill-posed
problem. This is what is known as an ill-posed problem in mathematics. There are
too many unknowns (in fact, an unknown number of unknowns)
for too few observations. The problem cannot be solved without
further assumptions

And like Tim put it, those are all going on inside the head, not with the waveform. And the waveform is the source of what gets constructed. It also explains why non-perceived effects in the brain will effect what is perceived. But that is all after the basic input from the ear which is a much simpler process at this point. None of this calls into question the basic idea that objective information about sound is in error or missing something important that the ear can pick up on.
I guess I haven't explained myself well enough or you haven't understood what I have been saying? What we hear is an active construct of audio streams determined by the "rules" of ASA that are being actively researched - we can't avoid this - it's how our perception of hearing works. So this is what is important to us, it's what we perceive, not the individual frequencies/amplitude of notes.

Now let's take the nearly universally agreed mechanism for how this works -Temporal Coherence - the idea that we split the incoming audio stream into separate neurological channels, each channel represents a frequency map over time. So in these channels we have a matrix of Time & frequency & Temporal Coherence suggests that audio streaming occurs by those sensing those channels whose signal features correlate & grouping them together - in other words the signals i each of these channels is grouped together as coming from one audio object. This seems to work pretty accurately as we can adjudge that our audio objects resulting from this construct actually does represent real world objects.

So this correlation between channels works over a relatively long timeframe i.e timing of signal features is being continually compared over many seconds as we are continuously monitoring the incoming audio signals.Until we begin to do measurements that take these factors into account, we are just guessing at how our measurements correlate with what we hear. What I'm suggesting is that we need to find out a new set of JNDs which relate not to frequency/amplitude but to the JNDs that cause disturbance in audio streaming. As I already linked to, that paper on head movement showed that this small timing/amplitude difference(from the head movement) of microseconds/decimal decibels is enough of a difference to disturb an audio stream & interfere with our perception of an audio object. So if this small differences were in the recording/playback & not head movement, we would similarly perceive the same.

Now my question is simple - which of our existing measurements will reveal microsecond/decimal amplitude differences of particular frequencies across many seconds of audio. Remember we don't know where, in the full recording of many minutes, these differences might occur. In fact do we not need a measuring system that tries to emulate the mechanism of temporal coherence - one which splits the incoming audio into a matrix of time/frequency & analyse this matrix? Quite a storage & computational task.

Without measurements that take into account how our auditory perception works we are doing the equivalent of looking through an electron microscope at cell micro-tubules from two different animals & trying to tell how different the animals will look. In fact there will be no difference in the micro-tubules & we now know that we should be looking at the DNA differences not using an electron microscope at all. But, I hear you argue, we've invested so much in this electron microscope & learning how to use it!!!.

As I said, I'm reminded of the L Cohen lyric "I'm guided by the beauty of our weapons"
 
Last edited:
So John, is this what you're working on? A "measuring system that take into account how our auditory perception works?" And then you're going to develop products based on the data from those measurements?

Tim
 
Maybe I recalibrate quicker than the people in that study, I don't know.
A bit of explanation is needed about the test signals used to study ASA. Generally, it uses a two tone test signal that is the equivalent of the visual illusion of the vase perceived from two face profiles facing each other. In the visual illusion we either see the vase or the two faces - we vacillate between the two perceptions - this is called a bistable perception. The same type of audio bistable signal is used to investigate auditory perception - two tones alternated - so frequency & timing can be investigated. Depending on the frequency/timing relationship, we either hear two separate audio streams or one audio stream. So I doubt you are comparing like with like when comparing yourself with the studies subjects. Here's the definitive series of such two tone demos/tests http://webpages.mcgill.ca/staff/Group2/abregm1/web/downloadstoc.htm#10
 
So John, is this what you're working on? A "measuring system that take into account how our auditory perception works?" And then you're going to develop products based on the data from those measurements?

Tim

Tim, I'm always researching this area & there's a lot more to be found out before any such measurement system is developed but this doesn't mean that incremental progress can't be made - I don't see why more accurate, more relevant measurements can't be developed that better relate to what we hear? This will require a better understanding of the workings of our auditory perception
 
Tim, I'm always researching this area & there's a lot more to be found out before any such measurement system is developed but this doesn't mean that incremental progress can't be made - I don't see why more accurate, more relevant measurements can't be developed that better relate to what we hear? This will require a better understanding of the workings of our auditory perception

...and in the meantime we just figure that all that stuff which we opine is superior, but which currently measures as inferior, has somehow hit on these cues to human audio perception by accident or by ear? And that most of the stuff (digital, SS, not expensive enough) that measures great, but is not high-end approved, has somehow, accidentally, gotten all these perceptual cues terribly wrong?

That seems to be the audiophile argument.

Tim
 

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu