The live, unamplified, music is really the only reference...it might not be a practical one to use but it is the only one, nonetheless. I guess, if you were only listening to amplified rock music you could argue that hearing someone playing live through a guitar amp etc. could also be a reference...but it is still with another layer on top (speakers, amps, mixer, engineer etc.) to muddy the waters.
I don't know what you mean by "our own references". This is pure relativism. This changes as per your whim and is not a reference at all...by definition. Reference means something you "refer to"...a constant if you will. It needs to be somewhat outside yourself. Sure it is observed and interpreted by you but for normal hearing people it won't be so different how we hear the real event.
There are recordings that have done an admirable job of capturing a performance accurately...use of these to establish the quality of a system as a practical reference. If your system captures these close to a live experience of something that is very similar (this relies on a good aural memory of course...or frequent repetition) then one can be reasonably certain that it captures what is on other recordings accurately as well...for better or worse.
The Audio Note UK guy, Peter Q. has a pretty good, but incomplete point in that a system should capture maximum contrast between recordings and this is because the range of recording quality is so diverse from attrocious to sublimely accurate. What is incomplete about this is that what he is advocating is that a system has a high precision and so small changes are readily discerned but that approach says nothing about the accuracy of that system. A system can show the tiniest differences but still be way off the mark in terms of tonality, dynamics, resolution (transparency), imaging, soundstaging etc. Discrimination alone is does not make a system accurate. A lot of people fall into this trap...they have über resolution, attack, soundstage etc. and every recording sounds different...but the sound of every recording is far from realistic, even with the best recordings, and therefore wrong.
Do you really know what it is in your system you are striving for? Or, having reached it (so you thought) , you realized that it wasn't what you thought it would be? I have long ago decided that there is not a system on this earth that will give me a true live experience for large orchestral works...I have never heard it and I doubt it exists...so I have focused on getting a system that is as realistic as possible for smaller ensembles (jazz, classical). This doesn't mean it won't do big classical well...it does but not realisitically well. Rock and electronic music sounds good and (right?) through it if the recording is good or harsh and compressed if that is how the recording is made...I don't want to change that because that means introducing deliberate bias to "soften" bad recordings.
I know a guy who changes whole systems on nearly a monthly basis...he has no idea about what is correct sounding but just likes changing the aural flavor for the experience of it and for his unabashed love of the gear. Knowing this, it is pointless to talk seriously with him about sound quality because he just wants to play and that's fine and that is his defined goal of the whole thing. I suspect that the guys with 4 of this and 5 of that are similar despite their protestations to the contrary. Now, I have three systems at home but they have well defined purposes. 1 is for "serious" listening, 1 is for late night listening and 1 is for TV and background listening...they are all in different parts of the house. Each has only 1 source of a given type and one pre/amp or integrated amp. Only the "serious" rig is striving for what I have posted above. The background rig is decent enough for me and my wife likes the looks. The late night rig is in our attic room where I can listen at night with disturbing and it is efficient (96db single driver) so it works really well at low volumes. it is limited in both highs and low but still sounds rather nice...great for working while listening because the big rig always commands my attention.