I found an article that addresses this question specifically here
In it a chart of research results over 60 years is given & the lowest JND is 0.25dB F.E. Toole and S. Olive, "The Modification of Timbre by Resonances: Perception and Measurements", JAES vol 36, # 3, March 1988, pp 122-142
The author states "Toole and Olive, on the other hand, in their 1988 study used pink noise for their acoustic signal source and determined that a 5 kHz resonance, with Q = 1 was just detectable at .25 dB." But later "The .25 dB figure quoted from the Toole & Olive research seems to contradict this (his JND of 0.75dB or 1dB), but consider the filter Q = 1. That's a pretty broad chunk of the audible spectrum over which that resonance exists. With the ear-brain combo performing an integration across that broad a portion of the audible spectrum than its easy to see how a large amount of acoustical energy is captured, leaving a change that small noticeable. However, pink noise is not real world and one thing my research has shown is that the hearing process reacts very differently to different types of sound; a .25 dB detectable difference using real music just isn't plausible and the research supports that."
He goes on to state "In this particular article, I settled on a minimum discernable difference dB value of .75 - 1.0. My experience has shown that this is what the average listener, under average listening conditions, listening to music played back through typical consumer-grade audio gear will be able to clearly identify - and do so repeatedly."
So, I'm still looking for any research which backs up the claim that 0.1dB amplitude difference is noticeable as a quality difference when playing music? Any links, ESL?
I am not at home and can't provide any now. It should be pretty easy to find some references to that.
If you match to .1 db your tests won't be corrupted by loudness issues. Is it possible to be more at some frequencies than others? Maybe, the JND (which btw is not .1db again the confusion) results indicate that is possible. Just match to .1 db and you need not worry.
The broad Q .25 db response differences are discernable with pink noise and some other signals. However, I don't know that they are heard as louder. They are heard as different. JNDs for hearing loudness are larger than for hearing two signals as different.
Why not make some pink noise and put some broad gentle peaks in the response and listen blind. Should be interesting and educational for you. Ditto with some music files. Make one .5 db louder and do an ABX in Foobar on it. See what happens and how it sounds/feels for you. Then try a .25 db.