Acknowledged; and I agree. Magazine reviewers are in a different position than I am; this technology is a magic box full of sparks to most reviewers, so they have the manufacturer's word (and hype) at face value. I don't. I want to know what's inside. That's what determines the sound, not the price-tag, ad-budget, or reviewer-schmoozing.
That said, I only have a bare minimum of technical understanding of these things. I've designed direct-radiator speakers for nearly thirty years, so I know them fairly well, vacuum-tube amps for fifteen years, so a little less, and horns for three years (but fortunate enough to have Bjorn Kolbrek, Jean-Michel LeCleac'h, and Martin Seddon of Azurahorn as mentors and collaborators).
What really baffles me about digital conversion is noise-shaping. To the best of my knowledge, it uses digital feedback around a single or 5 to 6-bit switch array to linearize the array. By adding just the right amount of dither-noise, the switch is PWM-modulated to achieve intermediate analog values that fall between what the switch can do on its own. A single-bit switch only has ON and OFF and is obviously grossly nonlinear, while a 6-bit switch is only somewhat linear, with 64 possible levels. Obviously far, far short of a 20-bit transmission system with a million possible levels. This is where the PWM-modulation and digital-feedback comes in, synthesizing the intermediate levels.
But stability problems can - and will - arise in complex feedback systems. In the analog world, we are limited by component parts variation and transit speeds through the forward path; get too complex and rely too much on the SPICE simulation, and the real world will bite you - hard. No fun to have a brand-new transistor amplifier destroy itself in a few milliseconds, and possibly take the speaker along, too.
In the digital world of noise-shaping, we get so-called "idle tones" and dynamic noise-floor modulation. The 20 dB jumps in noise-floor levels I saw at the 2011 RMAF presentation by ESS really got my attention; it was obvious that artifact-free noise-shaping design is extremely difficult.
I am also not comfortable with how the digital guys call the rising wall-of-noise (above 20 kHz) that's created by noise-shaping "noise". It's not noise as analog guys know it; it's the chopped-up debris of switch-errors, shoved to the top of the audio band, but as far as I can tell, is very much correlated with the audio signal. The beautiful thing about most sources of analog noise is that it is fully uncorrelated with the audio signal. If the transmission path has good low-level linearity, it'll stay that way, too, all the way to the loudspeaker. As long as it is uncorrelated, the ear/brain/mind system will easily ignore it, just as we ignore audience noise at a live concert. Once correlation starts, though, all bets are off.
After living with delta-sigma converters, I now feel that the noise-shaping algorithm is what we're hearing. What makes this problematic are the artifacts of noise-shaping are unfamiliar to most audiophiles and reviewers; once again, we're back in the "perfect sound forever" world, before we realized what lack of dither, jitter, and sample-rate conversion artifacts sounded like. I'm convinced that over time, we'll all start to hear the artifacts and recognize them.