A short, and undoubtedly insufficient list (since I'm writing this off the cuff) would be:
1) listener training
2) quiet, single-listener situation, with equipment, acoustics, etc of appropriate quality
3) negative and positive controls, and stimulus repetition for evaluation of consistency
4) perfect time alignment and level alignment (either of those off by much at all will absolutely result in a positive result)
5) feedback during training and after each individual trial
6) consistent A and B stimuli, which the subject is permitted to know, and who can refresh their recollection at any time. This is also an element that can easily cause any test to be positive by mistake.
7) transientless, quiet switching between the signals, with extremely low latency. Switch transients can cause either lower sensitivity or unblind a test, depending on how they arise.
8) the ability to loop the test material under user control
9) of course the setup must be double-blind, ordering must be varied, etc. All standard test confusion issues must be satisfied.
That's just few, that's not even close to a full set, but just that much shows how it isn't easy to run a good test.