Amir, while the benefit of a separate clock line is debatable, would you care to discuss what in the protocols themselves might make one or the other better in terms of jitter (or anything else)? Data encoding and such that could lead to lower/higher jitter, BER, etc. I don't know the standards off-hand and a quick summary might be useful (?)
The separate clock eliminates data dependent jitter that can be caused on S/PDIF. Outside of that, yes, it doesn't necessarily make things better or worse.
I don't think the problem has anything to do with the protocol. It simply is the case that to get audio, you have to decode and process everything that comes over HDMI. This lights up a ton of circuits in the receiver. That creates lot of coupling opportunities in the receiver to the DAC. I saw two manifestations of this:
1. Low frequency noise. This was random noise bleeding into the clock circuits at frequencies up to a few Khz but mostly clustered below 1 Khz. I take this as being the sum total of a ton of circuits doing whatever they are doing, appearing at the end as random noise over DAC clock.
2. Correlated noise at specific frequencies of tens of Hz. I thought I would be able to correlate it to video refresh rate and such but could not reach that conclusion. Either way, there are activities/timed events on at the receiver at these regular intervals and they would show up on the DAC clock.
The identical device over S/PDIF would have far lower levels of both. The noise would almost disappear and the correlated jitter spikes would vary and be at much reduced levels.
Especially problematic was that you could corrupt S/PDIF performance if you just plugged the HDMI cable in! Mind you, the noise level would be far lower than if you used HDMI as the input but clearly the HDMI cable was coupling the source and destination electrically and causing noise to bleed into the receiver even when you were using the cleaner S/PDIF input. Or alternatively, maybe the input HDMI receiver is running even if that input is not selected and some of its activities bleed into the rest of the system.
Note that one implementation, the Mark Levinson Processor, managed to keep HDMI noise almost completely out of the way. So we could say that the "protocol" is no the problem as it can be implemented well. Then again, the notion of slaving audio to video in all cases and hence forcing the system to deal with video seems wrong. There could have been a data channel that worked independent of video.