1970's solid state may be going too far LOL
Seriously though everything is intertwined to some extent, some greatly. If I were to break things down it would be these following three as earlier mentioned:
system resolution
noise floor mechanical, electrical and acoustical
benign acoustics meaning a gentle reverberation time curve because we're looking at all decays not just bells and triangles. Long tail LF reverberation just may be hardest to do right in small spaces
While system resolution is fairly constant, the other two are hurdles that rob the former of their maximum potential. There will come a point where any improvements on system resolution will be lost or degraded. In other words we hit a glass ceiling where we're just throwing time, money and effort away by being too gear centric.
Not mentioned for what should be obvious reasons is the software. The decays are either in the media or it isn't. My presumption is that our discussion revolves around good recordings and that we're not looking to squeeze blood out of stones. We have reverb units to do that! LOL
I think that we have to look at what is the truly a noise floor and what is distortion masquerading as a noise floor. First of all, true noise is random and it is well known that it is possible to hear sounds that are below the noise floor when those sounds are correlated noise (i.e music). As an example, it is possible to hear sounds in an old recording that are lower in level than the background tape hiss, which is one type of true random noise.
From a theoretical point of view, each individual active and passive element will add a bit of noise (the lowest level being simple thermal (or shot) noise) to the overall system noise floor. Sum them up (assuming you know the values for each one.) and you should be able to get an estimate about the overall background noise level. Of course things like ground loops etc. are noise that is not intrinsic to the devices and will additionally contribute to this basic level. The same is true for the electromechanical systems where we have some basic level of noise for a driver moving in and out, as well as self-noise of the driver from bending and breakup modes. Then there is cabinet resonance but these things are not intrinsic noise as they are not truly random, nor are they frequency independent. Therefore, we can think of these as distortions that can also add to the noise floor of what is perceptable.
Some speaker driver concepts are inherently better at tracing very small signals and converting this electrical energy into a hearable mechanical vibration. Compression drivers, with small light diaphrams and huge magnetic motors and then loaded into a horn, for example, will be far more responsive to micro or even nano volt signals being fed to them...assuming these aren't lost BEFORE they get to the driver. Other lightweight, high magnetic strength/high sensitivity drivers will be inherently more responsive than heavier, lower field strength/low sensitivty drivers...even if the low sensitivity driver has been placed into an ideal environment where the cabinet and crossover do not interfere further with the handling of those tiny voltage fluctuations. Electrostatic speakers capture this well despite a low electrical sensitivity because they still move down to very low inputs and you have a large surface producing a tiny motion...this means it is often still audible. Ribbons, IME, if high sensitivity (they can be with modern Nd magnets) are superb at capturing very small inflections of voltage given to them. However, older, lower sensitivity models were less successful unless played above a certain level to "wake up" the speaker...much like is necessary for most box speakers in the 85-88 db range. Too many lossses contribute to a "noise" floor of unresponsiveness.
The problem with many so called "noises" is that they are in some way correlated to the music and therefore serve to mask one or more frequency bands that can obscure and reduce low level resolution, which is vital for maintaing a good clarity of the acoustic space and instrumental decay. This is why room reflections and reverberations can impact clarity and perception of instrumental decay on the one hand. So, we can chalk up distortions from room effects as also contributing to the noise floor of what is perceptable. Why do I call them distortions? Because they are not intrinsic in the function of the system, they are dynamic and not static (i.e. ever present) as they change both with level and with frequency...like what is going on in the speaker micro-environment.
Another place to look is in the electronics themselves for this masking. This might be the most insidious and widely disregarded even though I have found it to be one of the more important issues. It was pointed out by Norman Crowhurst in the late 1950s that the use of negative feedback can result in an artificial noise floor that is SIGNAL modulated. This is very harmful to decay of instruments and the corollary of hall decay and perception of acoustic space. The problem is that this noise floor is moving with the music and is not a true noise any longer. This means that signals that drop below this new "noise" floor are lost to our hearing perception, unlike the tape hiss, and the sound his heard to be truncated. The degree to which this occurs depends on the whole signal chain and how many components are contributing to this signal modulated new floor. Can you hear the decay of piano notes in the presence of the orchestra playing behind (or even when the next notes by the pianist are played)? Can you still hear the effects of the hall when the orchestra is playing fff? If the answer is truly yes then your system has what Allen Wright used to call downward dynamic range (DDD). The ability for those soft sounds not to be masked in the presence of louder sounds is difficult for most systems because of this electronic masking (as well as the more mechnical and electro-mechanical issues I described above). Think about it like this, you have decay from a tympani drum in the background, how long can you hear this when the next strike comes or when the horns blare?? If you have an issue in your system where the noise floor is modulated with the signal intensity then your background "noise" will leap up and "swallow" the small decay signal thus masking it from further perception.
The "dryness" and flatness of perception of images of most Class D comes out of this masking with a modulated noise floor as well has high order harmonic distortion and true switching noise (but this also might be correlated if it is intermodulated with the signal). It pushes high frequency details "forward" in our perception, by making them seem louder than they really are. There seems to be more detail, but decay and the acoustic space that is defined by decays is truncated and therefore flattened in our perception because louder is perceived as closer. When high frequency reprodution is done truly cleanly, without grunge, IMD and high order simple harmonic distortion then soundspaces truly open up and not only depth but 3d imaging and layering becomes possible. Once this is achieved then decay sounds natural both from the instruments themselves and from the space around the instruments and those two things can be easily distinguished in space and time.
One last electronic barrier is the power. The cleaner the power, the closer you will get to the theoretical ability of your system. The problem seems to be that it is nearly impossible for stuff in the power line to not intermodulate to some degree with the signal through the amplification chain. I have heard some rather startling effects of removing this noise and it is almost spooky. However, for amps it is not always a good solution because of the power draw and the current limiting that most filters/regenerators impose. We have found that for sources a power regenerator is almost indispensible and works better than filters, which reduce but do not eliminate the issue.
Of course the best electronics in the world will only help to perserve this delicate information to the speakers/room. Speakers that are either unresponsive or excessively mask critical frequencies (or do both) will often sound truncated...although this can be greatly improved with good electronics choices or greatly worsened by poor choices. Rooms can mask as well but I have been able to get good success in even untreated rooms with directivity controlled speakers.
Where does that leave us? All electronic components in the chain (including power distribution) need to be free from generation of signal modulated "noise" floor. I have heard so many systems where the electronics simply got congested and flat sounding once the music got busy and complex (not even necessarily at high volumes). That confusion and congestion is what one hears when the noise floor has come up to swallow vital sounds that allow us to maintain a precise perception of where the music is coming from.
How do I know it was the electronics? We swapped in much better ones and the whole problem vanished like a morning fog. Clearly the speakers and room were then not to blame. I have heard this almost too many times to count...far more often than moving a same system to another room and hearing the whole thing improve so dramatically. Naturally, attention to the speaker choice and the room setup/treatment will pay dividends and is vital to "getting it right".