A switching power supply takes the AC input signal and converts it to DC. It then switches this on and off at higher frequency than the line input (60 Hz). The much higher frequency allows the transformer to be much smaller and the device achieving much higher efficiency. This is why switching power supplies are smaller, lighter and cooler than normal linear power supplies which run at 60 Hz.
The side effect is that now we are switching high voltage signals on and off. Doing so generates a lot of noise -- both in the produced output and radiated from the device. Filters are deployed to tame these but still, there is switching noise that did not exist in a linear power supply.
The theory around why this may impact a digital source (as opposed to a DAC) is that such switching noise can impact the timing of digital clock being produced together with the audio samples. Downstream devices, unless they run in "asynch mode" as you mentioned, rely on this clock to decide when to covet their digital samples to analog audio values. It is totally counterintuitive that CD audio running at just 44,100 sample per second would require high accuracy. But it does. If we want to reproduce 16 bit values perfect at 20 Khz for example, the clock accuracy must be 0.25 billionth of a second! Yes that is a quarter of a billionth of a second. Now, we don't usually care about fully reproducing such accuracy as the ear likely is not that sensitive in that region but at theory level, we have an incredibly difficult challenge with the timing of audio samples always being "analog" in a digital system. The net is that we have a super sensitive signal that has to have incredible accuracy and it is readily impacted by many factor including power supply noise.
The variations called jitter, are caused by many factors. If you look at devices using linear power supplies and look at their clock jitter spectrum, we often find a spike at 60 Hz telling us that the power supply frequency does leak into the clock timing. Here is an old example:
http://www.stereophile.com/features/368/index6.html
"The Pioneer CD-65, seen in figs.25 and 26, had higher jitter than the best transports, but lower jitter than the JVC XLZ-1010. We can see a jitter spike at 60Hz, no doubt due to power-supply noise."
See the bold line spiking at 60 Hz:
At 60 Hz though, the frequency is very low and not where the hear is sensitive. But a switching power supply runs at many KHz frequency so it can cause jitter at that rate. The way jitter works is that you get the sum and difference signals between the original clock and what is interfering with it. Result is that depending on the clock frequency of the switching power supply and what is in the original audio signal, we could get distortion which is now in the mid-range and audible frequency.
Of course, there are million different power supplies and we don't know what level of jitter they cause and at what frequency. That is why I said there is no definitive explanation here unless someone measured jitter before an after. Unfortunately jitter measurement costs $20K+ and the companies making the mods don't have the money to buy such an expensive gear.