Ultimately, this need to try to "prove" that what we "really" hear is only dependent on the air vibrations & nothing else is a doomed quest & ignores the nature of our auditory perception.
From the non-linear behaviour of the inner ear & the intermodulation products produced we are already seeing that what we hear is not a linear relationship to the air vibrations
The following is taken from this article
When auditory processing itself is examined we see both top-down & bottom-up processing being an essential part of auditory perception.
Bottom-up processing techniques are characterized by the fact that all information flows bottom-up: information is observed in an acoustic waveform, combined to provide meaningful auditory cues, and passed to higher level processes for further interpretation. This approach is also called data-driven processing.
Top-down processing utilizes internal, high-level models of the acoustic environment and prior knowledge of the properties and dependencies of the objects in it. In this approach information also flows top-down: a sensing system collects evidence that would either justify or cause a change in an internal world model and in the state of the objects in it. This approach is double-called prediction-driven processing, because it is strongly dependent on the predictions of an abstracted internal model, and on prior knowledge of the sound sources.
Top-down techniques can add to bottom-up processing and help it to solve otherwise ambiguous situations. Top-down rules may confirm an interpretation or cancel out some others. On the other hand, high-level knowledge can guide the attention and sensitivity of the low-level analysis.
In psychoacoustic experiments, listeners were played a speech recording in which a certain syllable had been deleted and replaced by a noise burst. Because of the linguistic context, the listeners also `heard' the removed syllable, and were even unable to identify exactly where the masking noise burst had occurred.
Sine-wave speech, in which the acoustic signal was modelled by a small number of sinusoid waves, was played to a group of listeners. Most listeners first recognized that signal as a series of tones, chirps, and blips with no apparent linguistic meaning. But after some period of time, all listeners unmistakably heard the words and had difficulties in separating the tones and blips. The linguistic information changed the perception of the signal. In music, internal models of the instrument sounds and tonal context have an analogous effect.
Scheirer mentions Thomassen's observation, which indicates that high-level melodic understanding in music may affect the low-level perception of the attributes of a single sound in a stream [Scheirer96a]. Thomassen observed that certain frequency contours of melodic lines lead to a percept of an accented sound -\x11as it would have been played stronger, although there was no change in the loudness of the sounds [Thomassen82].
Slaney illustrates the effect of context by explaining Ladefoged's experiment, where the same constant sample was played after two different introductory sentences [Slaney95]. Depending on the speaker of the introductory sentence "Please say what this word is: -", the listeners heard the subsequent constant sample to be either "bit" or "bet" [Ladefoged89].
Memory and hearing interact. In [Klapuri98] we have stated that paying attention to time intervals in rhythm and to frequency intervals of concurrent sounds has a certain goal among others: to unify the sounds to form a coherent structure that is able to express more than any of the sounds alone. We propose that also the structure in music has this function: similarities in two sound sequences tie these bigger entities together, although they may be separated in time and may differ from each other in details. These redundancies and repetitions facilitate the task of a human listener, and raise expectations in his mind. Only portions of a common theme need to be explicitly repeated to reconstruct the whole sequence in a listener's mind, and special attention can be paid to intentional variations in repeated sequences.
To try to suggest that top-down processing can be eliminated & only bottom-up processing prevail, is a mistaken understanding of the workings of auditory perception.
Edit: The fact is that top-down processing actually change the performance/functioning of the auditory perception by tuning lower-level sensory mechanisms to increase neural signal-to-noise ratios (SNR).
From the non-linear behaviour of the inner ear & the intermodulation products produced we are already seeing that what we hear is not a linear relationship to the air vibrations
The following is taken from this article
When auditory processing itself is examined we see both top-down & bottom-up processing being an essential part of auditory perception.
Bottom-up processing techniques are characterized by the fact that all information flows bottom-up: information is observed in an acoustic waveform, combined to provide meaningful auditory cues, and passed to higher level processes for further interpretation. This approach is also called data-driven processing.
Top-down processing utilizes internal, high-level models of the acoustic environment and prior knowledge of the properties and dependencies of the objects in it. In this approach information also flows top-down: a sensing system collects evidence that would either justify or cause a change in an internal world model and in the state of the objects in it. This approach is double-called prediction-driven processing, because it is strongly dependent on the predictions of an abstracted internal model, and on prior knowledge of the sound sources.
Top-down techniques can add to bottom-up processing and help it to solve otherwise ambiguous situations. Top-down rules may confirm an interpretation or cancel out some others. On the other hand, high-level knowledge can guide the attention and sensitivity of the low-level analysis.
In psychoacoustic experiments, listeners were played a speech recording in which a certain syllable had been deleted and replaced by a noise burst. Because of the linguistic context, the listeners also `heard' the removed syllable, and were even unable to identify exactly where the masking noise burst had occurred.
Sine-wave speech, in which the acoustic signal was modelled by a small number of sinusoid waves, was played to a group of listeners. Most listeners first recognized that signal as a series of tones, chirps, and blips with no apparent linguistic meaning. But after some period of time, all listeners unmistakably heard the words and had difficulties in separating the tones and blips. The linguistic information changed the perception of the signal. In music, internal models of the instrument sounds and tonal context have an analogous effect.
Scheirer mentions Thomassen's observation, which indicates that high-level melodic understanding in music may affect the low-level perception of the attributes of a single sound in a stream [Scheirer96a]. Thomassen observed that certain frequency contours of melodic lines lead to a percept of an accented sound -\x11as it would have been played stronger, although there was no change in the loudness of the sounds [Thomassen82].
Slaney illustrates the effect of context by explaining Ladefoged's experiment, where the same constant sample was played after two different introductory sentences [Slaney95]. Depending on the speaker of the introductory sentence "Please say what this word is: -", the listeners heard the subsequent constant sample to be either "bit" or "bet" [Ladefoged89].
Memory and hearing interact. In [Klapuri98] we have stated that paying attention to time intervals in rhythm and to frequency intervals of concurrent sounds has a certain goal among others: to unify the sounds to form a coherent structure that is able to express more than any of the sounds alone. We propose that also the structure in music has this function: similarities in two sound sequences tie these bigger entities together, although they may be separated in time and may differ from each other in details. These redundancies and repetitions facilitate the task of a human listener, and raise expectations in his mind. Only portions of a common theme need to be explicitly repeated to reconstruct the whole sequence in a listener's mind, and special attention can be paid to intentional variations in repeated sequences.
To try to suggest that top-down processing can be eliminated & only bottom-up processing prevail, is a mistaken understanding of the workings of auditory perception.
Edit: The fact is that top-down processing actually change the performance/functioning of the auditory perception by tuning lower-level sensory mechanisms to increase neural signal-to-noise ratios (SNR).
Last edited: