OK, there is part 2:
Key principals here:
1. The source is the master. It determines how fast or slow the samples are to be converted to analog in the DAC. The sampling rates you hear such as 44.1Khz are nominal values. They are NEVER used to determine the rate by the DAC! The DAC must play the samples at the rate that is coming to it from the source. It cannot substitute its own clock and call it done.
2. The way the DAC determines the rate is to look at the pulses carrying the data on its digital input. By counting how many are arriving per second, it can determine the speed with which it must play the content. If it sees 44,099 samples/sec, that is the rate, not 44,100. Spec allows +-5% variation from nominal sampling rate by the way.
3. The way the pulses are counted is to look at when they cross the 0 voltage point. Here is a sample measurement of S/PDIF by someone random on the Internet:
As you see the waveform itself is pretty corrupted and doesn't look anything like an idealized square wave. By using the edges as they cross zero, we don't worry as much about all that nasties on top of the waveform and also what level those are.
4. If you look carefully, you see the edges are not perfect either. The slope of the waveform can change based on cable and end-point characteristics, causing that line to be less or more vertical. If so, then the time measurement we make will be inaccurate. This is called "cable induced jitter." Our measurements started perfect at the source but by the time it traveled on the wire, it has gotten corrupted. Unfortunately, the corruption can actually be data dependent, making its distortion highly unpredictable.
5. Now we get to the receiver. The receiver makes the above measurement but also implements a flywheel effect like you would have in the heavy platter of a turntable. The flywheel speed can be adjusted up and down over time but it resists small changes. The same idea applied here means that if the variations or occurring very quickly, the player clock ignores them and keeps going at the rate it was. The fancy term for this is a Phased Locked Loop or PLL for short.
6. To however be sensitive to the receiver genuinely wanting the target to slow down or speed up (e.g. in the above example were it is one sample slower than nominal value), the PLL cannot throw out all variations. It must allow some through. There is also another problem it can run into if it filters too much in that it may take it a long time to lock onto incoming data rate. This is why on some DACs, you select the input and nothing plays for a second or two or even longer. So the design of the PLL becomes challenging in that you have conflicting requirements of being able to adapt quickly to speed changes yet not allowing noise and jitter induced up stream to get into your DAC. This is why even the most expensive DACs can still benefit from clean upstream digital signal.
7. Our problem becomes complex because impact of jitter goes up rapidly as you increase the bit depth and frequency. 16 bits doesn't sound like a big number but it is. It says that the system represents values that 1/65,000 apart in amplitude. At 20,000 Hz, for simplified jitter spectrum (a sinewave), this translates to 0.5 billionth of a second in timing accuracy for your clock! Anything higher and it will generate distortion that is higher than the lowest step your digital system can represent.
8. Note that just because you hear differences in systems, it doesn't mean it is all related to jitter up stream. The local clock in the DAC can also get disturbed by many other factors such as RF, power supply variations, activities in the rest of the device from DSPs to front panel, displays. So you also have local jitter to add to the equation. In addition, you electrical interference from upstream device that shares the ground with the DAC. Again, remember how delicate these signals are even at 16 bits.
Let me pause here and see if this is easy to digest so far.