Conclusive "Proof" that higher resolution audio sounds different

Orb · Aug 22, 2014

Phelonious Ponk said:
I'm not Arny, but no, I don't think you read correctly. He's saying that he would prefer better monitoring and control to prevent possible errors and cheating that could invalidate the results. It sounds like you're back on your personal definition of valid again, the one that makes the absence of perfect (by your definition) controls and protocols the same as none (sighted long-term listening). This is a misuse of "valid" in the service of rationalization...

val·id

(of an argument or point) having a sound basis in logic or fact; reasonable or cogent.
"a valid criticism"

...and it invalidates your argument.

Tim

The problem though Tim is who created the errors in the first place and who is credible in accepting the results of a test, because it comes across that Arny does not accept the results-tests for what Amir has done even though he proposed them in the first place.
This is unfortunately then further compounded by the ABX being casually pushed for many years and the results accepted as given; especially in the case of no audible difference with digital files (say hirez-16bit,transparency of downsampling-decimation,etc).
This then goes full circle with regards to listener training/experience/practice... again

Bowing out now because what is raised will not be concluded to a satisfactory level with all parties.
Cheers
Orb

maxflinn · Aug 22, 2014

Phelonious Ponk said:
I'm not in stuck the circular argument, John. You are. I'm just sticking my head in from time to time to counter that erroneous argument with a bit of reality. Validity, in the sense that you have thrown it around here, is not a question of semantics. You have very plainly stated that if JJs list of controls and protocols is not met completely, any test is invalid and no better than no test at all (i.e.: long-term casual listening). You are wrong, and your error is rooted in a misunderstanding or misuse of the term "valid." You keep misusing it, so you are the one who is circling. It's pretty simple, really.

Tim

+ Another.

jkeny · Aug 22, 2014

Phelonious Ponk said:
I'm not in stuck the circular argument, John. You are. I'm just sticking my head in from time to time to counter that erroneous argument with a bit of reality. Validity, in the sense that you have thrown it around here, is not a question of semantics. You have very plainly stated that if JJs list of controls and protocols is not met completely, any test is invalid and no better than no test at all (i.e.: long-term casual listening). You are wrong, and your error is rooted in a misunderstanding or misuse of the term "valid." You keep misusing it, so you are the one who is circling. It's pretty simple, really.

Tim

OK, Tim, when you are ready to actually deal with the points I raised then we can have a meaningful discussion - this "no, I'm not, you are" type of debate goes nowhere.

I've stated it before & will re-state it again - a test has a particular target that it is examining & sensitivity range - in this instance the target is establishing if there is a audible difference between RB & 24/96 audio files. Now, we are all agreed that it is going to be a small difference & therefore the test's sensitivity requirements are defined by this stipulation. When the order of sensitivity gets down to the JND or small impairment range more controls are needed than testing for large impairments, 20% distortion, for instance.

If the appropriate controls that are necessary to guarantee this sensitivity are not in place then the results are not to be trusted. I call this an invalid test as I said before. You seem to want to say that some results may well be correct & some wrong. The problem with your position is that you have no way of evaluating which are correct & which are wrong without further testing i.e controlling the factors that should have been controlled in the first place.

ArnyK is citing certain criteria as his reasons for dismissing the test results as "invalid" - things like honesty, IMD, etc. What is different about his position Vs mine? If appropriate controls were in place he wouldn't have these objections to the results - they would already have been dealt with by the controls - that's what they're for!! He has even agreed to this but followed it with the ad infinitum proviso - that in his view the necessary controls are never-ending "It goes on..........."

So, please don't post back "no, you're wrong" & expect a reply!

jkeny · Aug 22, 2014

FrantzM said:
+1here..

What I find most interesting is how long term listening according to that view would be the superior alternative to the "flawed" forums ABX.. Well we know where everybody stands .. You can have the last word John

Frantz, this is specifically about ARnyK's statement & my summary of it which Tim said was wrong. Don't bring back side issues which muddy the water. This is the old argumentation trick - "well if ABX is wrong then ..........." We are specifically talking about one test here - not comparing it to another test.

amirm · Aug 22, 2014

arnyk said:
The above has been generally understood for decades if not more like half a century or more. The controversy has been over what constitutes such a being, and how such beings might come into existence.

It certainly has been decades seeing how ITU BS1116 recommendations has been around for many years. The question then is how come when DIY tests such as Meyer and Moran are done, there are no trained listeners in the group. And there is no training material to familiarize the testers with the flaws and use that screen out people who do not have critical listening skills.

Here is you in the debate thread on this topic:

Well trained ears exist and of course there are a lot of people of all ages with damaged hearing. The idea that there are people with hearing that is orders of magnitude better than that of most people with undamaged hearing and adequate preparation has never developed a lot of experienced-based traction. If you understand how the ears work, they are like the rest of the human body - there are definite limits that aren't that hard to run into during testing.

The last sentence is completely opposed to accepting the notion of expert listeners. The training is not about how the ear works. It is about how the brain works. My ears did not change from prior to being trained to after being a trained just a short few months later. Neither did the "rest of the human body." Yet, my ability to hear small distortions grew exponentially more than non-trained listeners. Witness the 320 kbps MP3 test I post yesterday. Most people can't tell the difference between MP3 and original at 128 kbps. As such, they won't have any ability to hear artifacts in 320 kbps.

I am not saying these things to brag. It simply is a fact that a trained listener, like a trained doctor, can do better than others. Just because you are "human" and have a pair of ears, it doesn't mean you are fit for this task.

Here is you again on this topic: http://www.avsforum.com/forum/91-au...curate-sound-reproduction-3.html#post22028401

arnyk said:
Which just points out that both golden ears and golden brains don't exist. The human body has finite limits, many of which we already know.

I think it is only now, after seeing the results of these tests that we have acceptance of training mattering. And the need for critical listeners in such tests.

But sure, if we are in agreement let's see the list of DBT ABX tests that employed trained listeners in the artifacts they were testing. Do you have any Arny?

Historically, there have been a striking lack of objective means that were used to establish whether or not a certain individual was a trained listener, and for what? Most so-called trained listeners were self-appointed and allegedly proved their mettle by means of sighted evaluations.

So what? A lot of people claim to be audio experts on forums. I don't see anyone rallying to get them all banned from these discussions. On the contrary, totally unqualified people who have never done a single one of these tests in the industry or research, run around and repeat conclusions from tests whose conditions they don't understand.

Ditto. A key requirement for positive results in ABX tests is the existence of an actual audible difference.

That is circular logic Arny. If you know there is actual audible difference then why bother doing the test? It makes no sense to say that.

Since the previous gold standard for listening evaluations was the totally-flawed sighted evaluation method, there has been a lot of confusion about what constitutes an audible difference. Scientists who had been doing reliable listening tests for the better part of a century knew, but audiophiles and audio practitioners had been largely kept in the dark.

Changing the topic to sighted evaluations is a misdirection that is so obvious that we better not resort to it, lest we want to appear to be without answers in the current discussion which is double blind testing with computer control. Double blind tests are not in trial. What is on trial is unskilled people creating and running them.

But yes, you are right that DIY tests by a group of audiophiles can be full of errors because they are not "scientists" (whatever that means in this context). Without any experience in finding small differences, they run headlong into such tests and creating invalid results. Unfortunately upstanding citizens including yourself run with those results if it agrees with their point of view in audio. Witness how you said you had run blind tests and you didn't find people telling the difference between 32 Khz sampling and higher. Even non-trained listeners easily passed that test as reported in the debate thread. That invalidates your testing then, does it not?

amirm · Aug 22, 2014

Orb said:
The problem though Tim is who created the errors in the first place and who is credible in accepting the results of a test, because it comes across that Arny does not accept the results-tests for what Amir has done even though he proposed them in the first place.

This is what is remarkable about these discussions. Arny created a challenge that he presented as unbeatable for many years across thousands of people. He even proposed the tool to be used, foobar2000. An ABX tool that makes it difficult to use segment analysis to find differences. Despite completely shattering the supposed outcome of such a test, nothing seems to have happened in fervor with which arguments are made.

I mean it is not like I countered with sighted tests. Or complained about running a double blind test. I complied and showed my belief in value of such tests. But I am still counted as the enemy by Arny and crew. Why? Because the outcome is disliked.

Ultimately that is what we are dealing with: this is not a argument about "science" but protecting beliefs.

BTW, I should note that Tim has been one of the most accepting members of objectivity camp in this regard so I am not talking about him but rather the same group of people continuing to fight against the results of the very test they advocate.

jkeny · Aug 22, 2014

Orb said:
The problem though Tim is who created the errors in the first place and who is credible in accepting the results of a test, because it comes across that Arny does not accept the results-tests for what Amir has done even though he proposed them in the first place.

Indeed

This is unfortunately then further compounded by the ABX being casually pushed for many years and the results accepted as given; especially in the case of no audible difference with digital files (say hirez-16bit,transparency of downsampling-decimation,etc).

Not just other ABX tests but his exact same ABX test.

This then goes full circle with regards to listener training/experience/practice... again

Bowing out now because what is raised will not be concluded to a satisfactory level with all parties.
Cheers
Orb

Certainly not with the 3 parties who all seem to be in +agreement.
I know the other persons argument can seem absurd through the prism of our long-held beliefs, particularly when those beliefs have become internalised without any scrutiny as to their logic or "validity". I have tried to restate your argument, Tim a couple of times now in what I saw as your main objections - science & fallibility, etc. so I am trying hard to come over to your side to some degree but failing to do so because logic keeps getting in my way.

jkeny · Aug 22, 2014

amirm said:
This is what is remarkable about these discussions. Arny created a challenge that he presented as unbeatable for many years across thousands of people. He even proposed the tool to be used, foobar2000. An ABX tool that makes it difficult to use segment analysis to find differences. Despite completely shattering the supposed outcome of such a test, nothing seems to have happened in fervor with which arguments are made.

I mean it is not like I countered with sighted tests. Or complained about running a double blind test. I complied and showed my belief in value of such tests. But I am still counted as the enemy by Arny and crew. Why? Because the outcome is disliked.

Ultimately that is what we are dealing with: this is not a argument about "science" but protecting beliefs.

BTW, I should note that Tim has been one of the most accepting members of objectivity camp in this regard so I am not talking about him but rather the same group of people continuing to fight against the results of the very test they advocate.

I believe I already mentioned that I started a thread on PinkfishMedia called "Sorting out faith-based from evidence-based" from which Maxflinn is a refugee. The same arguments were put forth on that thread as are seen here & the same faith-based attitude on display, despite the intention of the thread being in the title. Even had people asking me what was my intention with starting the thread

Orb · Aug 22, 2014

amirm said:
......

But yes, you are right that DIY tests by a group of audiophiles can be full of errors because they are not "scientists" (whatever that means in this context). .........

Ooh that reminds me; well with yesterday being 21st August this reminds me of how even scientists can err go well off the range in terms of professional approach....
Specifically the mess around Demon Core and using a freaking screwdriver (who would think a screwdriver could slip, shocked I say shocked

) to control criticality and another time tungsten carbide bricks to reflect neutrons and monitor criticality (only to drop the brick and then trigger the effect, nothing like not being careful and without safeguards around potential critical chain reaction).
Geniuses for sure, but they really needed to work out their methodology-implications on some things

Terrible accident and the results, but it is something you would expect hobbyists not professional scientists to do.

Moral of the story, plenty!
Cheers
Orb

esldude · Aug 22, 2014

jkeny said:
OK, Tim, when you are ready to actually deal with the points I raised then we can have a meaningful discussion - this "no, I'm not, you are" type of debate goes nowhere.

I've stated it before & will re-state it again - a test has a particular target that it is examining & sensitivity range - in this instance the target is establishing if there is a audible difference between RB & 24/96 audio files. Now, we are all agreed that it is going to be a small difference & therefore the test's sensitivity requirements are defined by this stipulation. When the order of sensitivity gets down to the JND or small impairment range more controls are needed than testing for large impairments, 20% distortion, for instance.

If the appropriate controls that are necessary to guarantee this sensitivity are not in place then the results are not to be trusted. I call this an invalid test as I said before. You seem to want to say that some results may well be correct & some wrong. The problem with your position is that you have no way of evaluating which are correct & which are wrong without further testing i.e controlling the factors that should have been controlled in the first place.

ArnyK is citing certain criteria as his reasons for dismissing the test results as "invalid" - things like honesty, IMD, etc. What is different about his position Vs mine? If appropriate controls were in place he wouldn't have these objections to the results - they would already have been dealt with by the controls - that's what they're for!! He has even agreed to this but followed it with the ad infinitum proviso - that in his view the necessary controls are never-ending "It goes on..........."

So, please don't post back "no, you're wrong" & expect a reply!

I have to agree with Tim about who is being circular here.

What Meyer and Moran showed quite well in my opinion are two things. One that AD/DA conversion is quite transparent for the most part. Two that night and day differences being roundly put forth then and since by the subjective audio press and on most audio forums by audiophiles were not true.

Could M&M tests have been done in a better way to tease out smaller differences that might be real between hirez and redbook? Yes, and so they don't rise to the level of showing there absolutely is not any perceptible difference. They are plenty good enough to say there is no night and day difference. Plenty good enough to say any difference is on the relatively small end of the spectrum. Tests done less rigorously then M&M show differences quite reliably that aren't exactly large.

I have known people who couldn't stand music passed through AD/DA because it bleached out all the life. Someone who claimed to hear whether it was redbook or hirez playing in the background while you talked to them over the phone. People who wanted hirez only in the car. I think M&M are quite sufficient to validly say those claims make no sense. I think similarly of Arny's files at a somewhat lower level of validity.

Tim has said more than once as well that once differences have gotten small enough that such advanced testing is required to tease it out near the very margins of human audibility then it has passed below a level he is going to worry about. So I don't see him being circular in his view of things.

amirm · Aug 22, 2014

arnyk said:
While that is true, negatives in general don't seem to get nearly as much publicity, and for obvious reasons.

What Arny? The situation is completely the opposite. Take this test of yours:

This test was performed in 1984 with you as one of the three participants. Yet it was not until year 2014 that its content was published by me after buying that issue of the magazine. Its own author, you Arny, did not do that for some 30 years. If positive outcomes get more publicity, how did this happen?

Contrast that with other tests of amplifiers that had negative outcome. I bet I can find thousands of references to that post.

So it is positive tests that get no publicity for "good reason." They invalidate our militant views that such outcomes are impossible.

amirm · Aug 22, 2014

arnyk said:
If I had a free choice in the matter a lot more tests would be run and reported, and there would be more careful proctoring.

??? Arny when asked how many tests out of thousands that you say you have run, has been documented you said just that one amplifier post I just quoted. Just one! If you believe in what you just said, how come this is your track record in publishing such tests?

While the test tools themselves could have more self-checking, there's nothing to keep someone from totally gaming the system and providing utterly fraudulent results.

For example, falsifying the test logs is trivial. Falsifying the identity of the files being compared is trivial. Test gear could be attached to the monitoring system to tell the desired results. While an attempt was made to put some self-testing of the monitoring system into the procedures, the self-tests themselves can be misapplied and misinterpreted. It goes on...

OK, so is this fraudulent Arny?

I certainly would not assume it is fraudulent.

Yes, it is possible that you all lied in the above test to get a sensational headline. But I won't go there whatsoever unless I have some evidence in my hand. You have none in your hands. I have reported my results as is including mistakes and searches for finding the flaws. That included occasional misses. I could have edited those out but winning this kind of Internet argument is not remotely worth losing your reputation for honesty. I post under my real name so the stakes are quite high.

This is no top of me proving for years the reason I could potentially pass such tests due to my training. So this is not a random fluke. Further, a few other people have managed to hear some of these differences. Are we all frauds? I hope not

.

jkeny · Aug 22, 2014

esldude said:
I have to agree with Tim about who is being circular here.

What Meyer and Moran showed quite well in my opinion are two things. One that AD/DA conversion is quite transparent for the most part. Two that night and day differences being roundly put forth then and since by the subjective audio press and on most audio forums by audiophiles were not true.

Could M&M tests have been done in a better way to tease out smaller differences that might be real between hirez and redbook? Yes, and so they don't rise to the level of showing there absolutely is not any perceptible difference. They are plenty good enough to say there is no night and day difference. Plenty good enough to say any difference is on the relatively small end of the spectrum. Tests done less rigorously then M&M show differences quite reliably that aren't exactly large.

Leaving aside the M&M test - there's no argument that the differences are small. I don't believe I've said otherwise - in fact that's the whole point of what I'm saying - tests for small differences need tight controls in order to ensure a test sensitive enough to be capable to reveal audibility, if it exists. If you are rallying against the common hyperbole of audiophiles, that's not in question here in anything I've said.

I have known people who couldn't stand music passed through AD/DA because it bleached out all the life. Someone who claimed to hear whether it was redbook or hirez playing in the background while you talked to them over the phone. People who wanted hirez only in the car. I think M&M are quite sufficient to validly say those claims make no sense. I think similarly of Arny's files at a somewhat lower level of validity.

Tim has said more than once as well that once differences have gotten small enough that such advanced testing is required to tease it out near the very margins of human audibility then it has passed below a level he is going to worry about. So I don't see him being circular in his view of things.

Sure, I've seen him state that these differences are below the level that he is interested in. No problem with that. But again you seem to be trying to make the case that ARnyK's test was somehow to prove that audiophiles used hyerbole? I don't believe that was/is his reason for the test. As Amir has pointed out many times, now - his claim was that no one could hear ANY differences between the files & that no golden-ear listeners actually exist.

You quote my post but yet reply to nothing in it - instead you create another argument which I didn't make. Your points seem to be arguing another case, not in question & not related to the test?

arnyk · Aug 22, 2014

jkeny said:
So, If I read you correctly, your only solution for a valid test is to have the test proctored by someone like J_J ?
I take it then that we should, according to this view, discount all unproctored ABX results (positive & null) as suspicious & discount any conclusions/indications/suggestions arising from them?

Lets put it this way. Credibility is not an all or nothing at all sort of thing. It comes in degrees. If we have a result that is contrary to a great deal of previous seemingly credible evidence, then it needs more than trivial amounts of credibility to achieve comparable or superior status.

If a person with JJ's status and reputation in the industry put his name on a listening test, that would be very meaningful to a lot of people. In the case of Meyer and Moran, the approval of the AES editorial review committee has a great deal of weight.

Right now the amount of evidence we have is trivial, and it doesn't all go the same way.

jkeny · Aug 22, 2014

arnyk said:
Lets put it this way. Credibility is not an all or nothing at all sort of thing. It comes in degrees. If we have a result that is contrary to a great deal of previous seemingly credible evidence, then it needs more than trivial amounts of credibility to achieve comparable or superior status.

If a person with JJ's status and reputation in the industry put his name on a listening test, that would be very meaningful to a lot of people. In the case of Meyer and Moran, the approval of the AES editorial review committee has a great deal of weight.

Right now the amount of evidence we have is trivial, and it doesn't all go the same way.

I have sympathy for your position, Arny but why would you expect all results to go the same way?

Can you tell us what the "great deal of previously seemingly credible evidence" is, please?

arnyk · Aug 22, 2014

jkeny said:
So, If I read you correctly, your only solution for a valid test is to have the test proctored by someone like J_J ?

Incorrect. A test proctored by the right people would go a long way to support the test's credibility. I'm not going to be bullied into saying that there is one and only one way that a test can be credible. That would be an excluded middle argument and I try very hard to avoid making them.

I take it then that we should, according to this view, discount all unproctored ABX results (positive & null) as suspicious & discount any conclusions/indications/suggestions arising from them?

Looks to me like yet another excluded middle bully-job. Assigning a value of less than utter certainty and sufficiency to the outcome of a minuscule number of tests is not the same as discounting them. A lot of the ABX tests I've been associated with involved dozens of listeners and a committee of experienced test organizers and proctors.

Would I rate the 1984 "Some amplifiers do sound different" tests far higher? Yes. For the time being I'll leave common sense explanations of why to others, in the hope that there are still people posting here who have some common sense left.

Everything in the real world has a weight which is somewhere between infinitesimal and huge but still possible to enumerate.

A minuscule number of positive results among a tiny population of mixed results that are obtained by means of an almost vanishing number of unsupervised individuals from various backgrounds is not very good evidence.

It grieves me to say stuff like this to adults because they should already know it well. Rush to judgement, anybody?

arnyk · Aug 22, 2014

Phelonious Ponk said:
I'm not Arny, but no, I don't think you read correctly. He's saying that he would prefer better monitoring and control to prevent possible errors and cheating that could invalidate the results. It sounds like you're back on your personal definition of valid again, the one that makes the absence of perfect (by your definition) controls and protocols the same as none (sighted long-term listening). This is a misuse of "valid" in the service of rationalization...

val·id

(of an argument or point) having a sound basis in logic or fact; reasonable or cogent.
"a valid criticism"

...and it invalidates your argument.

Tim

All good points.

arnyk · Aug 22, 2014

amirm said:
??? Arny when asked how many tests out of thousands that you say you have run, has been documented you said just that one amplifier post I just quoted. Just one! If you believe in what you just said, how come this is your track record in publishing such tests?

Egregious false claim. Many of the tests I was involved with (I did not solely run the test mentioned above) were documented in various places such as Audio Magazine and Stereo Review.

I don't think that making false claims about other people's work enhances the credibility of one's own work, which is a lesson that some seem to need to learn. In fact it seems to seriously detract from it. Based on this false claim which has been repeated in kind several times on AVS, I see good reason to discount every claim about listening test outcomes that some have made. If someone is going to make false claims like this, where will they stop?

amirm · Aug 22, 2014

arnyk said:
If a person with JJ's status and reputation in the industry put his name on a listening test, that would be very meaningful to a lot of people.

I was JJ's (senior) boss. My results then is not a random person reporting something. So if you want to go by credentials, mine is quite proper and well above countless others whose test results you accept (see below).

In the case of Meyer and Moran, the approval of the AES editorial review committee has a great deal of weight.

In this specific regard, it should have no weight at all. No one in the journal is vouching for ethics of the tests conducted by them. They have simply reviewed a paper and thought it rose above the minimum standard for publication. The test could have been completely fake and no one at the Journal would have caught it.

This kind of myth seems to not die. People keep thinking someone from the journal showed up and looked over people's shoulders to make sure they were testing things correctly and audited the process when in reality no technical peer review is ever conducted that way. I have explained this so many times but the myth keeps getting repeated per above. I know you have seen my explanation Arny in various threads. Why do you keep propagating the myth?

And remember, these are the credentials of the authors:

They seem like pretty nice people but completely miss the standards for which you thought JJ is qualified. They lack any prior experience in this field whatsoever.

I should be clear that I trust everything they have written as not being fraudulent. Due to lack of controls however, I don't trust the results of their work.

Right now the amount of evidence we have is trivial, and it doesn't all go the same way.

There is nothing trivial about it with respect to forum arguments. There has not been such a development or you wouldn't be here and on AVS arguing so hard to create doubt about it.

The evidence is super strong in two areas:

1. Trained/expert listeners have far better abilities than masses of public or even "audiophiles." Lack of their use is against best practices of the industry/research community and hence seriously undermines any test which did without them.

2. Test created by Arny himself and positioned as an impossibility, going as low as saying 32 Khz sampling is transparent, was falsified. According to your own most, for some 14 (?) years no one had managed to pass such tests. But now multiple people have (to varying degree).

#1 is 100% supported by industry/research practices. It is a new concept for many on forums but not in real world. People are now getting educated and hopefully won't go around saying results of one set of blind tests applies to everyone else.

#2 speaks for itself. No longer can you say Arny that this and that test says our hearing is that dull. You created a test for that and we passed it. It doesn't get better than this as the commercial goes

.

arnyk · Aug 22, 2014

jkeny said:
I have sympathy for your position, Arny but why would you expect all results to go the same way?

I don't. This is getting very tiring. I guess I need to point out a few things about statistics 101. Statistics are based on more than a tiny number of samples. The samples don't have to all go the same way, but in the end a lot of them do hoave to go the same way.

Can you tell us what the "great deal of previously seemingly credible evidence" is, please?

I gave a real world example which would seem to suffice, perchance someone had no other background in statistics.

Here's a memory test: Please give the names of the two experimenters who are associated with a previous experiment related to high resolution audio that was published in the JAES. At this point, it seems like this salient fact has been forgotten and I need to do a sanity check...

Conclusive "Proof" that higher resolution audio sounds different

New Member

New Member

Industry Expert, Member Sponsor

Industry Expert, Member Sponsor

Banned

Banned

Industry Expert, Member Sponsor

Industry Expert, Member Sponsor

New Member

New Member

Banned

Banned

Industry Expert, Member Sponsor

New Member

Industry Expert, Member Sponsor

New Member

New Member

New Member

Banned

New Member

Similar threads