A precursor to that question is what is gained from extended listening? Answer is familiarity. Most of the time, we don't know what we are listening to.
Proper double-blind listening of music will include time and availability of material in advance to allow the user to become familiar with them. There should be no time limit to that process allowing the person to become firmly trained. In standards groups, the same clips are used over and over again so that this training is not necessary (that creates its own set of problems unrelated to topic at hand).
So, a round about way to say that well executed tests that are designed for learning rather than tricking someone to fail, does include provisions as you mention although obviously not to the level of allowing months to go by before the tests are done.
When I setup blind testing at home, of course I have all the time in the world available to do things to my satisfaction. If I want to spend a month doing it, I can.
Of course, no test is perfect. Blind tests can fall victim to many problems which could invalidate part or all of their results. Nothing is more difficult than creating proper audio test protocols. I usually have no problem finding half a dozen flaws in tests published by experts in respected circles such as AES. This is hard stuff to do right. Worse yet, they are too expensive to repeat once problems are found.
Well I guess my experience is that many pieces of audio equipment that initially sound good (say for the first week or two-and assuming they're burned in), often don't sound cut the mustard with extended listening time. In other words, those reproduction qualities that initially endear a piece of equipment to our ears are frequently offset by others that grow more annoying with time; I might submit that added detail in certain frequency ranges can initially endear itself to one's ear but after sometimes an hour, but more the case, weeks, causes listening fatigue. (all one needs to do is go back ten or a little more years and listen to those piece of gear with "high definition" or other phrases. They could peel paint off the wall.
There are way too many variables to take into consideration when listening to and evaluating a piece of equipment that it's impossible in a blind/short term test to the brain to select/appraise all of them and come to a valid conclusion. And with audio equipment, assuming nothing is perfect, how does one discriminate between those colorations that the ear can listen through and those that the ear can't.