We'll have to agree to disagree on this one, Amir. For the theory, look at something like
Discrete Signal Processing by Oppenheim and Schaffer (pretty sure that's the one -- mine's at work). I do believe ideal quantization generates only odd-order terms, but of course there are other things at play, e.g. the relationship of the clock to the signal, that causes every bin to fill in even in an ideal system. That's why the usual white-noise approximations are valid and standardly used. (Standardly, is that a word? I be an engineer, not a grammar person...) To me, noise and distortion arise from different things in the circuit, and have a much different impact on the output. I do not treat them the same. Maybe that's one difference between us low-brow hairy-knuckled engineers and the high-brow scientist types, but it's in me blood.
Truncation to 16 bits is not the same as sampling with a 16-bit system; truncation will add distortion. In that paper, I think dither is primarily masking the truncation errors, and you need a goodly amount of energy to that, thus the relatively high noise floor. Dither normally only needs to be a few lsbs, raising the noise floor by only a few dB, unless there is a lot of nonlinearity. At least, that's the way on works in the systems I have worked with (audio and RF, though we play other games in the RF world to decorrelate the spurs and not raise in-band noise). If I ever built a 16-bit converter with that high of spurs (only 40 dBFS or so, about what you might get out of 5 to 6 bit converter), I'd be fired, or at least tied to my desk until I fixed it. Maybe I should run some plots to show the differences, hmmm... In any event, the paper is an interesting and useful look at the impact of dither, but the test case is unrealistic. I suspect something else is going on...
Also, the impact of noise decorrelation (dither) is a bit different in delta-sigma designs (which I assume is the architecture you plotted from the rising noise response) than in a conventional converter. The early 1-bit, low-order loops were very prone to tones and dither was (and is, for that matter, in any such loops) required to suppress tones in the output of the delta-sigma modulator (ADC or DAC). Technology and techniques have advanced so that modern architectures are more complex, higher-order and often using multi-bit loops, so dither is less critical for stability and tones, though is often still added to produce a more pleasing noise floor.
To address your first point last, we agree on that one! That's why I am (usually) careful to distinguish between random and deterministic jitter. And, though I have yet to really research it in depth, I still feel deterministic jitter is a far worse culprit that random jitter for us (no matter what frequency we operate).
Back to practicing - Don