A 3-D soundstage occurs when the spatial cues on the recording are presented to the listener. The cues are generally reverberations or echoes that our brain uses to calculate a sense of space. These cues must be reproduced without so much distortion, alteration or truncation that the psychoacoustic requirements are met for the brain to calculate this space. This results in suspension of disbelief so the listener can experience the music similar to how they would hear it if they were at the recording venue. This is the "You Are There" presentation.
OTOH, we have the "They Are Here" presentation in which spatial cues formed by the playback system dominate instead of the spatial cues in the recording. The psychoacoustic requirements the brain needs to process space no longer perceives the recording venue because either the system fails to playback the cues, or the room acoustic cues are dominant, or some of both. You can still get precise imaging, but timbre will be negatively affected as well, because timbre depends on similar cues as space, the measured decay of sound, how the reverberations or notes formed by an instrument of vocal trails off. The decay of sound is often truncated by both the recording and playback systems, so in order to preserve the decay and achieve a "You Are There" presentation, we must maximize resolution and have a true "High Fidelity" system.
IME, those who prefer a subjective sound such as trying to recreate what they have heard live rather than simply trying to maximize fidelity usually come around to the true definition of High Fidelity eventually, because to do otherwise reduces the "You Are There" effect, which as I mentioned previously, is the number 1 driver of subjective preference IF the listener is presented with a sound that achieves it.
While "You Are There" requires crossing a psychoacoustic barrier, in other words the brain must be given enough information that it can comfortably fill in any gaps, the more those gaps are filled, the more successful the illusion.
Further, the best systems are able to actually extend the truncated decay of sound so that it matches what the brain expects to hear more precisely. This is what the multi-channel proponents miss. They try to achieve this by adding more speakers to fill-in the soundstage, but what we really need is proper decay. Extending decay can be done via electromechanical feedback (this is what racks, footers, tube dampers and phono mats and devices do), or it can be extended via some acoustic devices. The subjective preference for vinyl and tubes can be partially explained by the fact they are electromechanical feedback devices moreso than other parts of the system.
This is also why folks who don't think AC power or cables make a difference can't achieve a "You Are There" presentation, without the entire system working to a high level, without true "High Fidelity" the decay is further truncated and falls below the levels required psychoacoustically, so you end up with a lower level of spatial and timbral performance. Similarly, those who think an extremely "live" room sounds the best will not achieve it either. A poor waterfall plot with too much room decay is a major issue, those room effects will often dominate over what's on the recording.