There is something wrong with what's being reported here, because at least one ABX test has identified #22 speaker cables at a fairly short length.
And, of course, .1dB, if it's broadband, is getting up to the level of audibility. You will not hear "loudness" differences, rather they will appear as "quality" differences.
Typically matching is required to .2dB, but all you need is .1 up and .1 down to create an audible situation, albeit one that does not sound like a loudness change, but rather a quality change.
This raises an interesting point because many amps and CD players/DAC will have .1 +- difference somewhere over the 20hz-20khz frequency range, and usually at multiple points, this is compounded that source-output impedance/noise-distortion/filters/etc varies across the frequency range as well (apreciate this variation will also depend upon the type whether digital/amp/etc).
Matching exactly at 1khz should not be seen as arbitarily level matching loudness -just adding to the discussion.
That said I agree there must be level matching otherwise it does present what JJ says, but in reality any truly valid test would need to be in a completely enclosed/controlled environment; meaning bespoke scientific equipment to control and simulate all those variables.
Personally I feel any such test measurement also should be a more complex mathematical note rather than a single tone (more like a specific major chord that also has a specific-defined attack and decay) and analysed as a true envelope with time/amplitude/harmonics - but this is just me although I know some others in the audio world with scientific backgrounds have a similar thought (not naming and dragging them into said discussion as there is no satisfactory answer for all sides to the current discussion and so we have now 174pages).
I would like to see an ABX double blind controlled environment test using various music where two identical sources only differentiate between one having zero distortion and the other introduced 1% distortion maximum (low order and not skewed to high order) using different music each time and hit statistical 85%+ with 15 attempts; I appreciate some testing has been done to 4% (and this was not same environment) but curious just how well anyone will pass 1% in that specific ABX environment.
My money would be on zero across a group and not just a single listener(could be eating humble pie

), but technically it should be audible.
Anyway different point but valid to the objective listening.
And so expanding upon the last point but coming back to 0.1db being noticable.
JJ, has any DBT randomised ABX be done with two identical sources apart from one having an introduced 0.1db loudness and the listener identifying that source 85% ?
I appreciate it is more complex than this because we could be talking consistent 0.1db or peaks/dips over the FR of 20hz-20khz and the results may differ, anyway interested if this is a controlled ABX test you know about.
Cheers
Orb