Jack,
But excessively simple IMHO. There is more than Xmax and Xmin .When you establish the dynamic range of a concert people talk of the ratio between the maximum level and the noise level of the room . But the noise the room is useful information and part of the recording and must be encoded - it "needs some bits" - I do not know how many. The classical definition of dynamics compares the maximum level of undistorted signal with noise from electronics - that has a random behavior and does not represent any useful information. IMHO we can not go from one to the other just directly. It is why some people say that at less 18 or 20 bits are needed for subjective quality listening at typical sound levels, adding two extra bits for implementation losses, making it 20 or 22.
The preference for an higher number of bits, or in the case of Mark , DSD with its higher dynamic range is a clear subjective indication that the low levels are not clearly reproduced in 44.1/16. Our brain prefers the representation that has less errors.
It may seem excessively simple my friend but that is where it begins. That in tandem with the highest and lowest frequencies. It is these parameters that determine the selection of settings. In the former the the recordist will adjust gain if his bit depth is limited and the latter the LPF/AAF if his sampling rate is limited. The majority of the time if not all the time the first limitation is the transducer used. As you know selection is based on composite rather than a pure tone. That's where I see a lot of people getting hung up on sampling theory. They see a reconstructed pure sine wave and say its perfect there ergo a composite signal is perfect. To me that is giving the scientists and engineers that developed digital in the first place and continue to attempt to improve it, short thrift. The math being solid has nothing to do with the performance the degree at which a device under test actually living up to the math. That is two separate issues.
The question is, how many quantizations is enough to give the semblance of a continuous COMPOSITE wave form. The answer depends on how faithful you want it. If it is merely for passable performance, many would be surprised how well 12bit does. Going upwards, talk about diminishing returns because what is being chased is infinity.
So where does one draw the line? In studios people are now starting to record at 32bit. Processing today is being done with machines running 64bit floating point. DACs today use 32bit engines (more to be able to do digital volume control and for more DSP). These people aren't doing it just because they can. There are very clear, very real benefits.
Like you I'm invested in quality CD playback because like you I have music in that format that I love. That is the only reason. While I have no intention of doing my entire CD library at a higher bit depth, sampling rate, I do know the comparative superiority of the same over the 16bit/44.1 standard. Now that is an important word. Standard. It wasn't Nyquist or Shannon. It was Sony and Philips who set it. One need not go far to find why they chose it. That was what was practical and economical AT THAT TIME. Way good enough to beat a cassette tape or a mass market table with an MM cart set up by someone who didn't own so much as a cardboard alignment template playing beat up records.
Well, the world tends to move on. 24bit is now economical and practical too. While many people try to shoot it down as excessive, they should be glad that some producers are even making it available given how vulnerable they can become in this world of modern piracy.