Conclusive "Proof" that higher resolution audio sounds different

maxflinn · Aug 11, 2014

esldude said:
So blind testing that misses some of JJ's requirements isn't useless it is less precise, and less certain. It may fail to work on genuine audible differences once those differences are small enough and near enough to being inaudible. For many purposes it is sufficient, and it is valid.

When you start to weigh out the evidence long term listening has very little, blind level matched comparisons seem to jump ahead in discriminating ability just with those most basic removals of biases and influencing factors.

Agreed.

Heck if people are a bit open minded quite a few with strong opinions have been taken aback if you simply get them to listen sighted and level matched when comparing gear. Taken aback that some substantial differences they held to be evident suddenly shrink tremendously with simple level matched comparison listening.

A guy from another forum recently hosted two tests of DACs, one level-matched ABX and one a level-matched but sighted comparison. He and the other participants found that even a very small volume difference could be picked up and louder sounded better, so I agree again that level-matching is very important.

So the more of JJ's list you manage the better, but to act as if leaving one off makes it fully invalid is simply not a very reasonable approach considering all the evidence. And that is without introducing the evidence from psychology about the effects of sightedness which weigh against sighted listening.

Agreed again, it's what I've been saying for a while on this thread.

Good post BTW, elsdude.

esldude · Aug 11, 2014

jkeny said:
Ys, esldude, I agree with what you say - as the differences become smaller the need for implementing the controls becomes greater.

We are now mostly at the stage in audio development where we are talking about small differences - so this is the de facto area we are interested in - hence the growing need for these controls. Say that these two controls are the most influential & therefore should be the first to be eliminated. Why would you then assume that all else is of no significance for small differences? It just seems to be a large leap of faith. Maybe it's because the other are difficult to implement? But are they?

It's relatively easy to implement a blind test, a bit more difficult to properly implement level matching in tests. So one way to prove that you aren't just blindly taking a leap of faith is to implement maybe the next easiest control - positive & negative controls. Probably less difficult to implement than level matching but it is something that is never seen to be done in forum organised blind tests. It' not that they aren't known about - many (including J_J) have suggested such controls for a long time & still nobody seems interested in implementing them. The question is why?

These positive & negative controls are like a supervisor in the test, in that they examine the whole listening procedure (people, equipment, material, etc) doesn't have a propensity towards returning false negatives or false positives. As I said before, it's the equivalent of ensuring your measurement equipment is fully working & sensitive enough to use for the range required to measure - it's a calibration step, if you like.

I would think implementing controls is a good step. I would guess it isn't done as such is more complex to implement on a forum type test.

Now you have me at least wrong in part of your reply. I am not assuming all other things in blind testing are of no significance. It is no leap of faith.

Another type of testing I have not seen on forums is adaptive testing. I don't see it equal to control groups, but it would help with people owning equipment of differing abilities and people having different hearing abilities. It would help if software like the Foobar ABX plug-in were available to implement it. Even when not trying to determine some new threshold it would be instructive to participants into what some of the limits of their hearing/equipment is. For those not familiar, I am talking of the type testing where if you are testing for some known audible variable you start at a clearly audible level. Once you get 3 correct choices, the variable is made smaller and again if you get 3 correct it gets smaller again. If you miss one, it goes to the previous level and you start over. With a few trials you determine your threshold for the variable under test.

But all such matters are another level of understanding and education needed for forum testing. Doesn't mean it isn't worth doing just difficult. I once tried to do a simple forum test using 2 alternative forced choice. I could never explain it sufficiently well to enough people to get past the complaint you should have a choice saying the two sounded the same rather than being forced to choose one. Probably my own inadequacies communicating to other people.

jkeny · Aug 11, 2014

esldude said:
Another type of testing I have not seen on forums is adaptive testing. I don't see it equal to control groups,

Sorry, I should explain it - what I mean by positive & negative controls are not control groups but rather a known difference that should be audible which would test for false negatives i.e do listeners hear this difference? If none do then the test setup is not resolving enough. If some do & some don't then those listeners that don't should be eliminated from the test.

Similarly, two devices or samples exactly the same should return no audible difference to test for false positives.

They are probably not that difficult to include in any test, I believe?

esldude · Aug 11, 2014

jkeny said:
Sorry, I should explain it - what I mean by positive & negative controls are not control groups but rather a known difference that should be audible which would test for false negatives i.e do listeners hear this difference? If none do then the test setup is not resolving enough. If some do & some don't then those listeners that don't should be eliminated from the test.

Similarly, two devices or samples exactly the same should return no audible difference to test for false positives.

They are probably not that difficult to include in any test, I believe?

Well I have seen a few tests that were done that way. One of a few files was a ringer that should be picked up. In one I remember in some detail, of 6 files being compared one was 1 db louder. No one caught the increased loudness. All participants detected it with high reliability blind. That would meet your description apparently. The comparison was ostensibly between files that were originally of different sample rates. All were picked at chance levels except for the loud one which was also one of two files at redbook resolution.

The adaptive or weighted adaptive testing protocol would also do what you wish and let you see each individual listener's personal threshold.

Phelonious Ponk · Aug 12, 2014

esldude said:
So blind testing that misses some of JJ's requirements isn't useless it is less precise, and less certain. It may fail to work on genuine audible differences once those differences are small enough and near enough to being inaudible. For many purposes it is sufficient, and it is valid.

It may, but let's not lose sight of what has actually been shown here, on this board, in this thread -- Amir did ID very subtle differences, using ABX testing lacking most of JJ's controls, and his result was repeated by a handful of participants. Here. Right here. Evidence, not proof, but the only evidence presented here.

It could be pretty easy to be distracted from that fact, here at the end of a incredibly long, circular argument made, not just against the efficacy, but against the validity of exactly what this thread demonstrated the efficacy of.

Tim

microstrip · Aug 12, 2014

Phelonious Ponk said:
It may, but let's not lose sight of what has actually been shown here, on this board, in this thread -- Amir did ID very subtle differences, using ABX testing lacking most of JJ's controls, and his result was repeated by a handful of participants. Here. Right here. Evidence, not proof, but the only evidence presented here.

It could be pretty easy to be distracted from that fact, here at the end of a incredibly long, circular argument made, not just against the efficacy, but against the validity of exactly what this thread demonstrated the efficacy of.

Tim

Tim,

The referred controls for false negatives are needed only if the experiment returns a null (a negative, the usual result of poorly carried tests). The statistical analysis of the results is the "control" of a positive - proving it was not due to chance and the difference exists! Unless you doubt on Amir report of the facts, it was proved!

BTW, what you insist in calling very subtle is reportedl as clealy audible and not as subtle by many experts in this forum. Each of us can pick whom he trusts.

jkeny · Aug 12, 2014

esldude said:
Well I have seen a few tests that were done that way. One of a few files was a ringer that should be picked up. In one I remember in some detail, of 6 files being compared one was 1 db louder. No one caught the increased loudness. All participants detected it with high reliability blind. That would meet your description apparently. The comparison was ostensibly between files that were originally of different sample rates. All were picked at chance levels except for the loud one which was also one of two files at redbook resolution.

The adaptive or weighted adaptive testing protocol would also do what you wish and let you see each individual listener's personal threshold.

Yea, that's the sort of idea - I haven't seen this in any test yet - got a link? 1dB of loudness may well be a bit too much of a gross difference & I guess herein lies the difficulty which is addressed by your "adaptive testing protocol" but is much more difficult to administer (tiring, boring & maybe self-defeating). The other issue is that just a focus on loudness or frequency is not enough - the sensitivity to timing differences needs to be examined too.

jkeny · Aug 12, 2014

Phelonious Ponk said:
It could be pretty easy to be distracted from that fact, here at the end of a incredibly long, circular argument made, not just against the efficacy, but against the validity of exactly what this thread demonstrated the efficacy of.

Tim

Tim, you miss the whole point of what I'm saying as micro has pointed out to you & again are incorrectly interpreting it & making a strawman argument from it.
Maybe this is why the thread has generated multiple posts?

Phelonious Ponk · Aug 12, 2014

microstrip said:
Tim,

The referred controls for false negatives are needed only if the experiment returns a null (a negative, the usual result of poorly carried tests). The statistical analysis of the results is the "control" of a positive - proving it was not due to chance and the difference exists! Unless you doubt on Amir report of the facts, it was proved!

First let me make sure I understand what you're saying here -- "Unless you doubt on Amir report of the facts." That seems to be asking if I doubt Amir's report of his positive ID. I do not. Read the post above that you quoted and responded to and it will make that clear.

Subtle? I would call anything that requires someone to know what differences to listen for, and to listen to specific short passages that highlight those differences in order to hear them is subtle. But your definition may vary.

Tim

Phelonious Ponk · Aug 12, 2014

jkeny said:
Tim, you miss the whole point of what I'm saying as micro has pointed out to you & again are incorrectly interpreting it & making a strawman argument from it.
Maybe this is why the thread has generated multiple posts?

John, you've made your point so many times, from so many angles, I couldn't possibly have missed it. But regardless of what you think your point is, here's what has happened here:

Amir started a thread reporting that using ABX, he had identified differences between RB and hires files. His results were repeated by several other participants.

You have argued, relentlessly, that ABX is ineffective and invalid unless they use strict controls, most of which were not used in these examples of the efficacy of ABX. Micro should be talking to you. You are evidently the one who doubts "on Amir report of the facts," not I.

Regarding "validity" and JJ's controls, some of his controls are good. Some of them are useless without being defined in much greater detail than has been done here. All of them are not enough for "validity" if what you're looking for is scientific proof. They are, as they stand, little more than a distraction from a result that, while inconclusive, supports the efficacy of ABX testing in the very thread in which you are attempting to deny it.

Perhaps you are operating on a private definition definition of "validity." That could easily keep you supporting the unsupportable and assuming everyone was missing your point for days.

Tim

jkeny · Aug 12, 2014

Tim, stop stating that I am trying to deny ABX testing, please! It's disingenuous & should be retracted.

Whatmore · Aug 12, 2014

jkeny said:
Tim, stop stating that I am trying to deny ABX testing, please! It's disingenuous & should be retracted.

only when it produces a null result

Phelonious Ponk · Aug 12, 2014

jkeny said:
Tim, stop stating that I am trying to deny ABX testing, please! It's disingenuous & should be retracted.

Have you not denied the validity of ABX testing without all of the controls you've listed here, John? And therefore, by definition, have you not denied the validity of Amir's results and the other positive results reported in this thread? I don't really want to go through all your posts in this thread and show you your own words. Owning up to them would be much less embarrassing for you.

Tim

FrantzM · Aug 12, 2014

Whatmore said:
only when it produces a null result

Phelonious Ponk · Aug 12, 2014

Whatmore said:
only when it produces a null result

Did I miss that part? Is John only demanding full JJ controls to declare a null result valid?

Tim

jkeny · Aug 12, 2014

Yes, guys, you seem to be so caught up in chasing this that you miss the obvious. As Micro already stated a number of posts back:

Tim,

The referred controls for false negatives are needed only if the experiment returns a null (a negative, the usual result of poorly carried tests). The statistical analysis of the results is the "control" of a positive - proving it was not due to chance and the difference exists! Unless you doubt on Amir report of the facts, it was proved!

I know it's a hard pill to swallow but it's science & sometimes science is tough

Now you can try to prove Amir's results are the result of other factors but that is up to you to decide to do. It is a continuing dialogue over on AVS, if anyone is interested - IMD in the playback, resampler not up to scratch, timing slew as a result of resampling & probably some others. Are these people wrong to investigate this? No, it's a necessary part of analysis. Do these investigations relate to exactly the points i'm making about controls & null results - yes, but the shoe is on the other foot - they are trying to prove something akin to a false positive result i.e the files, if resampled correctly & played back through "proper" equipment would show no audible differences.

I'm really not sure why some of you guys don't see the whole picture, just a one-sided view of it?

What seems to be alluding some people is that a positive ABX result is it - it is statistically proving that the results are not chance - there is only one reason for the result - the tester is consistently hearing differences. A null result can result for all sorts of reasons & controls are needed if you want to be able to draw any conclusions about this null result - which should never be done, anyway but many here want to draw conclusion but yet not deal with the controls.

So. Tim, please retract what you are trying to suggest is my stance - namely, that I'm denying ABX testing!!

maxflinn · Aug 12, 2014

microstrip said:
The referred controls for false negatives are needed only if the experiment returns a null (a negative, the usual result of poorly carried tests).

Micro, do you consider a null result to be due to a poorly carried out test when the participants reported differences sighted first?

Also, have you any evidence to support your view that null results are usually the result of poorly carried out tests?

Thanks.

maxflinn · Aug 12, 2014

Sit a bunch of guys in front of a HiFi and let them listen to their hearts content to A and B (whatever they may be), level matched and sighted.

Ask them whether they heard differences and as per usual, they'll probably say they did.

Remove the knowledge of whether A or B is being used at any given time.

Ask them to identify A from B, or what X is. There's plenty of ways to do this and we all know of them.

If none can reliably do so then it's highly likely that they were imagining the differences they reported when sighted.

It is not highly likely that the sighted and blind results differed because of a lack of controls in the test procedure.

It is highly likely that the sighted and blind results differed because of the controls that were in place in the test procedure.

This would be a perfectly valid blind-test.

jkeny · Aug 12, 2014

The other mistake in logic that's being made here is that some people take the positive result of one invalid test (sighted) and the negative result of another invalid test (blind test lacing sufficient controls) and conclude that because there has been a change, that the negative result is therefore correct - forgetting or ignoring that it's result is based on an invalid test?

What we have seen on this thread are many sighted reports of there being a difference between high-res & RB audio - a positive result based on an invalid test. Now a positive result based on a blind test comes along & confirms all the sighted reports. Does this mean that all the sighted reports are now correct? By the same logic as is being used here, it should be the natural conclusion & therefore it should also be the natural conclusion that sighted reports are an accurate reflection of reality.

BTW, I'm just using the logic demonstrated on this thread in my statements above - please don't take them out of context & quote them to create strawman arguments for me to defend!

Phelonious Ponk · Aug 12, 2014

jkeny said:
Yes, guys, you seem to be so caught up in chasing this that you miss the obvious. As Micro already stated a number of posts back:

I know it's a hard pill to swallow but it's science & sometimes science is tough

Now you can try to prove Amir's results are the result of other factors but that is up to you to decide to do. It is a continuing dialogue over on AVS, if anyone is interested - IMD in the playback, resampler not up to scratch, timing slew as a result of resampling & probably some others. Are these people wrong to investigate this? No, it's a necessary part of analysis. Do these investigations relate to exactly the points i'm making about controls & null results - yes, but the shoe is on the other foot - they are trying to prove something akin to a false positive result i.e the files, if resampled correctly & played back through "proper" equipment would show no audible differences.

I'm really not sure why some of you guys don't see the whole picture, just a one-sided view of it?

What seems to be alluding some people is that a positive ABX result is it - it is statistically proving that the results are not chance - there is only one reason for the result - the tester is consistently hearing differences. A null result can result for all sorts of reasons & controls are needed if you want to be able to draw any conclusions about this null result - which should never be done, anyway but many here want to draw conclusion but yet not deal with the controls.

So. Tim, please retract what you are trying to suggest is my stance - namely, that I'm denying ABX testing!!

There's no need to retract what I haven't said, John. Odd to quote myself, but here goes:

You have argued, relentlessly, that ABX is ineffective and invalid unless they use strict controls, most of which were not used in these examples of the efficacy of ABX. Micro should be talking to you. You are evidently the one who doubts "on Amir report of the facts," not I.

I didn't say that you broadly denied the validity of ABX testing, John. And I didn't realize that you were applying those controls only in the event of a negative result, because I honestly didn't expect you to take such an absurd position. Controls are applied before testing, to all tests, before you have a result. Applying controls after the result, to a specific result, is the opposite of control, it is influence. I know it's a hard pill to swallow but it's science & sometimes science is tough.

Tim

Conclusive "Proof" that higher resolution audio sounds different

New Member

New Member

Industry Expert, Member Sponsor

New Member

New Member

VIP/Donor

Industry Expert, Member Sponsor

Industry Expert, Member Sponsor

New Member

New Member

Industry Expert, Member Sponsor

Well-Known Member

New Member

Member Sponsor & WBF Founding Member

New Member

Industry Expert, Member Sponsor

New Member

New Member

Industry Expert, Member Sponsor

New Member

Similar threads