What I'm suggesting is that before we consciously think about the timbre of an instrument & whether it accurately matches our memory of the live instrument, our analytic engine has done a lot of work subconsciously. This work at the subconscious level is very basic but also very complex.
It's best to use an analogy to explain what I mean - imagine you are sitting at the edge of a swimming pool with only your two feet dipped in the water & you can't see or hear. There are many people in the pool splashing, moving around, swimming, etc. All these pool activities happening at the same time cause composite waves which arrive at your feet. Working out where people are & what they are doing in the pool just based on the arriving waves at the two feet is the equivalent to what auditory processing is doing. So in other words out of the mixture of waveforms, the ones that the swimmer is creating is identified & grouped together out of the composite mix of waves & this grouping is maintained as the swimmer moves through the pool. The same applies to all the people/objects creating waves, each is separately identified just by their waveform not just as a once off but on an ongoing basis. As you can see this is a very complex inference engine which requires many past examples from which to learn how waveforms in pools behave with different people/objects & actions creating the waves - so it's a heuristic inference engine.
So an internal working model is built based on the best fit to the sensed waves. This model has within it expectations of how these waves will behave (the heuristics element). But what happens if a wave arrives which doesn't properly fit into the existing model or more likely our best fit analysis was wrong & only now discovered based on an arriving wave which doesn't fit? The working model has to change to best accommodate this . (This is what is meant by our perceptions are an interpretation of what's out there & really a best guess at any point in time (usually fairly accurate or accurate enough for our continued existence in the world). It's not necessary for it to be highly accurate, rather it needs to be fast & adapted to our needs.)
This is the job auditory processing is performing at a subconscious level & a working model created in real-time representing the current auditory objects & tracking their movement/progress over time all at the subconscious level. All this is happening before we even come to consciously consider whether the timbre of an instrument is correct (in our judgement)
So when we listen to our 2 channel playback systems we are suspending some of the rules & expectations in this analysis - much the same as we do when looking at TV, video, etc. Listening to our 2 channel stereo creates a working model which just about satisfies enough criteria to conclude it is realistic - in other words it is just close enough to the working model that would be created if we were listening to the same event live, that we can more easily enter into the engagement/immersion state that we could easily do if we were at the live event (I'm using "live event" for the sake of shorthand). if there is some anomaly in the sound from our reproduction system that perception has to change it's working model then the more this happens the more energy is consumed & more fatigue/disinterest/discomfort results (again this is happening subconsciously)
But I believe 2 channel stereo is a precarious thing on the edge of this division between "realism" & blabla/uninteresting sound - it takes a lot of the small things to be correct in the reproduced sound to satisfy this criteria. It's a surrogate for reality in much the same way as the the actual recording is a surrogate for a musical event
IMO, this explains a lot about this hobby but from a different perspective perhaps?
Yes, I agree that we are transported when listening to a good system & even music we are not familiar with is interesting - maybe not as interesting/engaging as music we know & love but still there's enough realism in it to engage us. My quip about the sound of background trains was really to point out that this detail isn't the goal but rather that realism/engagement is the goal. I do believe that this sort of low level detail is necessary for realism
I'm considering this at a lower level initially as you see above & what I'm suggesting is that from babies onwards we absorb the world of sound, correlate it with the world of images & with these two senses build internal models of how objects behave in the world both in their visual aspects & in their auditory aspects. So a bell sound has a sharp attack & a long decay (not the other way around) - a small bell produces a higher freq than a large bell, etc. In the visual model I think of a scene from Father Ted
"small cow or far away"
I don't think this defines preference as it happens to everybody as part of the development of our senses from birth. With regard to exposure to our replay systems, yes I think we become familiar with its sound signature & in that way we evaluate new devices inserted into the system. Listening to live music on a daily basis should instil in us an innate expertise in how instruments/voices sound, I guess?