when it comes to complex harmonic structures, that multiple, multiple cilia are involved, with micro timing differentials between them. That these cilia are anchored in a 3d matrix or odd shaped 'sack like' ball in the ear. That this sack is motionally activated in 3 different axis with wave flow in the structure and the surface, much like multiple vibrations originating on the surface of an within the fluid in a water balloon.
Imagine all those cilia or hairs inserted in this 'fluid sack' in 3d or an XYZ pattern. And that the motion of the ball in the xyz form, is going to be uneven and all kinds of resonance and flow patterns will emerge/occur in the flow/vibration....and the cilia will be activated via these complex and evolving motional patterns ...to send signals to the aural nerve complexes and to the brain.... to decode and blend them at micro speed differentials that are FAR beyond and WELL above a lousy little 20khz. Thus each ear has a 3d xyz complex of decoding, with micro differentials in timing and level, for large numbers of cilia, at the same time. (we use the leading edge positive transient values, for the most part, and internally reconstruct/guesstimate from that)
Not just some singular and poorly conceived and executed comparison that is used in the world of audio engineering. The ear, as a complex system...is so far from that simplistic audio engineering and linear weighting idea... that it is almost staggering, when one understands how far apart and opposite the directions are.
The ear/brain....simplifies the data rates by concentrating, for the larger part, on ONLY the leading edge transient values and timing differentials for all the cilia and we reconstruct the overall shape in the mind, or decoding engine. It cuts the data rates, in liner weighted terms, by about 90%. The ear only utilizes micro differentials in multiple frequencies and in the context of this fluid sack's vibrational xyz patterns, over time.
I take no issue with your general argument but wonder where you find this information which conflates auditory and vestibular transduction. How does auditory input encode "3d xyz" or where in the CNS is it correlated with auditory input?