Why 24/192 is a bad idea?

...Have you ever played music through software that let's you vary the bit depth as it plays? ...
--Ethan

No, but I've compared CD's burnt with 16 bits or 14 bits with the original 24 bit (burnt onto DVD-A) from which I made them. In a single blind test, with me as the only listener but someone else controlling the source, it's fairly easy to tell the difference, usually needing only about 10-15 seconds of listening (on some types of music, only 5 seconds or less)
 
Of interest only if you believe that noise floor is the only relevant measurement for bit-depth in digital audio systems. If that were true, we'd really only need 13-14 bits, not even 16.

Correct. 16 bits builds in some extra room, "just in case", just as audiophiles suggest we should.
 
Of interest only if you believe that noise floor is the only relevant measurement for bit-depth in digital audio systems. If that were true, we'd really only need 13-14 bits, not even 16. For that matter, we'd only need 320k or good VBR MP3's or OggVorbis :D

While designing CELT (which is now part of the Opus codec), I had to do a lot of tuning. CELT is a bit special for an audio codec in that it encodes the gain each frequency band (Bark scale) and then it explicitly allocates a certain bit depth to each of these bands. Just to give a rough idea, CELT sounds transparent to my ears for most content, at around 128 kb/s. Now, I'm not that good at hearing differences, so I got other people to listen to it and while they can certainly hear more artefacts that I can, nobody has yet been able to hear any artefact beyond 256 kb/s (on the latest version, some early versions suck). But just to be safe, CELT actually supports bit-rates that go up to 512 kb/s.

So what's the bit depth when encoding at 512 kb/s? Lower frequencies get a bit depth around 8-9 bits, while the highest band (16-20 kHz) gets only 3 bits of depth. And that's for a crazily high bitrate. Keep in mind that this after encoding the gain (i.e. after eliminating the effect of the dynamic range). But what it shows is that within a critical band nobody's even close to hearing noise at 48 dB SNR in the low frequencies and at 18 dB SNR at higher frequencies. Human ears are way overrated. We're good at hearing the "spectral envelope" with a good dynamic range, but that's about it.
 
While designing CELT (which is now part of the Opus codec), I had to do a lot of tuning. CELT is a bit special for an audio codec in that it encodes the gain each frequency band (Bark scale) and then it explicitly allocates a certain bit depth to each of these bands. Just to give a rough idea, CELT sounds transparent to my ears for most content, at around 128 kb/s. Now, I'm not that good at hearing differences, so I got other people to listen to it and while they can certainly hear more artefacts that I can, nobody has yet been able to hear any artefact beyond 256 kb/s (on the latest version, some early versions suck). But just to be safe, CELT actually supports bit-rates that go up to 512 kb/s.

So what's the bit depth when encoding at 512 kb/s? Lower frequencies get a bit depth around 8-9 bits, while the highest band (16-20 kHz) gets only 3 bits of depth. And that's for a crazily high bitrate. Keep in mind that this after encoding the gain (i.e. after eliminating the effect of the dynamic range). But what it shows is that within a critical band nobody's even close to hearing noise at 48 dB SNR in the low frequencies and at 18 dB SNR at higher frequencies. Human ears are way overrated. We're good at hearing the "spectral envelope" with a good dynamic range, but that's about it.

An excellent illustration about why people who care about sound quality (audiophiles?) tend to dismiss this whole argument about why 24/192 is unnecessary.
 
Jean-Marc was writing about lossy compression, not the quality of "standard-resolution" Wave files.

--Ethan

Read the posts he was responding to, and please stop "shooting from the hip" without thinking.
 
Correct. 16 bits builds in some extra room, "just in case", just as audiophiles suggest we should.
Whether it does that or not, depends on a number of factors. Top of that list is what the dynamic range is that we want to preserve. Using the assumption that we can reasonably reproduce 20 bits, the range we would want to preserve then is around 120 db. Plotting 16-bit samples *with* proper dither against the threshold of hearing gets us this (from Bob Stuart's paper):

i-57BZWnH-X2.png


As we see, 16 bits is no longer sufficient. Even 18 doesn't do it. We have to get to 20 to achieve provable fidelity based on this analysis.

However if you apply noise shaping and push some of that into high frequencies, we can get there. To best do that, it would be nice to extend the bandwidth of the signal such that we have plenty of space above 20 KHz to "park" the dither noise. At 44.1KHz, we are pretty cramped. At higher sampling rates, we do get the real estate and can enjoy the freedom to push the noise up there.

This is why higher sampling rate may be useful even though you are not using it to increase the bandwidth of what we record/play but rather, how to get the noise/distortion level down. I think an earlier reference to this benefit.

In general, it is important to think of digital systems as a rectangle described by sampling rate on one axis, and bit depth in the other. We can borrow from one to help the other. Think of how SACD gets this done with just "one bit."

And here is the problem: I am not sure how many people making music understanding these complexities as they convert their music from 24 bits to 16. Instead of leaving that to chance, it is best to get the bits from them and then we can convert them to 16 with proper noise shaping should we desire smaller file sizes and can adjust the sampling rate to accommodate the same.
 
I tend to agree. The considerable investment in DVD-A and SACD was a flop. I'll bet against another effort with physical media.

I would be against another physical format as well but if Neil Young or Apple (or both) can make a good sounding file format of high resolution then I think that will likely take off.

Hirez benefits are real and substantial so I want to experience as much hirez as possible. I just love the feeling of getting "sucked into" the music and enjoying the performance and forgetting all about this gear stuff. That's the sign of good playback imho.
 
All of this technical talk is fascinating, really, but I'm a bit short on time. Can anyone point me to the scholarly paper that details the discovery that middle-aged men with expensive audio toys can now hear beyond 20khz?

Tim

Tim, this content from a CalTech professor should help.

http://www.cco.caltech.edu/~boyk/spectra/spectra.htm

Given the existence of musical-instrument energy above 20 kilohertz, it is natural to ask whether the energy matters to human perception or music recording. The common view is that energy above 20 kHz does not matter, but AES preprint 3207 by Oohashi et al. claims that reproduced sound above 26 kHz "induces activation of alpha-EEG (electroencephalogram) rhythms that persist in the absence of high frequency stimulation, and can affect perception of sound quality." [4]
Oohashi and his colleagues recorded gamelan to a bandwidth of 60 kHz, and played back the recording to listeners through a speaker system with an extra tweeter for the range above 26 kHz. This tweeter was driven by its own amplifier, and the 26 kHz electronic crossover before the amplifier used steep filters. The experimenters found that the listeners' EEGs and their subjective ratings of the sound quality were affected by whether this "ultra-tweeter" was on or off, even though the listeners explicitly denied that the reproduced sound was affected by the ultra-tweeter, and also denied, when presented with the ultrasonics alone, that any sound at all was being played.
From the fact that changes in subjects' EEGs "persist in the absence of high frequency stimulation," Oohashi and his colleagues infer that in audio comparisons, a substantial silent period is required between successive samples to avoid the second evaluation's being corrupted by "hangover" of reaction to the first.
The preprint gives photos of EEG results for only three of sixteen subjects. I hope that more will be published.

In a paper published in Science, Lenhardt et al. report that "bone-conducted ultrasonic hearing has been found capable of supporting frequency discrimination and speech detection in normal, older hearing-impaired, and profoundly deaf human subjects." [5] They speculate that the saccule may be involved, this being "an otolithic organ that responds to acceleration and gravity and may be responsible for transduction of sound after destruction of the cochlea," and they further point out that the saccule has neural cross-connections with the cochlea. [6]
 
Using the assumption that we can reasonably reproduce 20 bits, the range we would want to preserve then is around 120 db.
...

As we see, 16 bits is no longer sufficient. Even 18 doesn't do it. We have to get to 20 to achieve provable fidelity based on this analysis.

In other words, assuming we want 20 bits, we conclude that we need 20 bits. Hard to argue against that, aside from the fact that it's a bit circular.

My original point was that as long as you listen at a level where the max amplitude is 96 dB SPL (which I consider loud enough), just considering ATH is enough to conclude that 16 bit is enough. Now, if you listen at a level above that, 16 bits is still enough, but then you have to involve simultaneous masking to show that it is. In other words, if you have music playing at 120 dB, it's going to severely degrade the sensitivity of your hearing. Not to mention -- as Ethan pointed out -- that finding a room where the background noise is below ATH is not exactly easy.
 
Original quote from Ethan Winer: Jean-Marc was writing about lossy compression, not the quality of "standard-resolution" Wave files.

Read the posts he was responding to, and please stop "shooting from the hip" without thinking.

My point is that lossy compression (at least with the CELT design) can teach us about noise sensitivity. What it teaches us is that most of the bits in a wave file are used to "cover dynamic range". Once you have that covered, the actual depth you need for each band isn't that much. So in the original example of 16 bit samples with high frequencies being 8 bits below the "peak", you still have 8 bits of actual resolution, which is way more than enough.
 
Tim, this content from a CalTech professor should help.

http://www.cco.caltech.edu/~boyk/spectra/spectra.htm

It doesn't help much. This is about the only thing that even indicates a result:

The experimenters found that the listeners' EEGs and their subjective ratings of the sound quality were affected by whether this "ultra-tweeter" was on or off

And that's about as broad and non-committal a statement as you're likely to get. Affected how? Under what testing conditions? If I came in here with an objectivist argument backed up by no more than that, it would be shot down in minutes. I'll look at the link later. Maybe there is something more there.

Tim
 
I looked. Nothing else there. Mostly about ultrasonic content from musical instruments. Need a little more about these listening tests. Anybody got links to the Oohashi study itself?

Tim
 
In other words, assuming we want 20 bits, we conclude that we need 20 bits. Hard to argue against that, aside from the fact that it's a bit circular.
No, I showed that you can do with 16 if you used the proper signal processing.

My original point was that as long as you listen at a level where the max amplitude is 96 dB SPL (which I consider loud enough), just considering ATH is enough to conclude that 16 bit is enough.
Now *that* is circular :D. You pick 96 based on what? That it happens to be the dynamic range of 16 bits/CD? You can't solve an equation based on the variables you choose. But rather, what the customer needs. The customer is a high-end one, who wants the absolute best fidelity and nothing lost in the capture of the source. For that, we can look to some research:

http://www.aes.org/e-lib/browse.cfm?elib=11981
Author: Fielder, Louis D.

"Dynamic Range Requirement for Subjective Noise Free Reproduction of Music

A dynamic range of 118 dB is determined necessary for subjective noise-free reproduction of music in a dithered digital audio recorder. Maximum peak sound levels in music are compared to the minimum discernible level of white noise in a quiet listening situation. Microphone noise limitations, monitoring loudspeaker capabilities, and performance environment noise levels are also considered.
....
The recent emergence of PCM recording techniques for music reproduction and the desire to standardize this format involves a re-examination of dynamic range requirements for natural music reproduction. Standardization of a 16 bit linear format would limit the dynamic range capability to 96 dB, and limit the quality of future PCM recorders if a wider range eventually became necessary.
....
The most accurate of previous examinations of dynamic range requirements was done by Fletcher [1] , who argued that 100 dB dynamic range was necessary.... Fletcher ignored the ear's ability to detect a noise source below that of the room noise by source localization.
...
For this particular microphone, the overload point is 130 dB and thus would allow the capturing of an equivalent dynamic range of 121 dB if peak levels of 130 dB exist in a performance. From the tabulation on peak sound levels close to musical instruments in Table 3, it is seen that musical instruments are capable of producing these high sound levels especially at distances less than 3 feet.
...
Four different microphones were measured which had overload levels between 120 to 140 decibels. They were all condenser microphones and as the graph shows the noise levels in the 3 - 7 kHz region were within 5 dB of each other. In summary, it is shown that close talking techniques and the proper selection of a microphone produces no limitation or reduction on the dynamic range requirement as determined by the playback experiments. Even a natural miking technique results in only a 9 dB white noise threshold.

In conclusion, several experiments were made to determine the dynamic range requirement for a recording system to produce no audible hiss when used to play back music at natural listening levels. These experiments resulted in a dynamic range requirement of 118 dB (non-amplified music), 124 dB (amplified music) for the professional, and 106 dB for the high quality consumer playback system."


http://www.aes.org/e-lib/browse.cfm?elib=7948
Author: Fielder, Louis D. (1995)

"Dynamic-Range Issues in the Modern Digital Audio Environment

The peak sound levels of music performances are combined with the audibility of noise in sound reproduction circumstances to yield a dynamic-range criterion for noise-free reproduction of music. This criterion is then examined in light of limitations due to microphones, analog-to-digital conversion, digital audio storage, low-bit-rate coders, digital-to-analog conversion, and loudspeakers. A dynamic range of over 120 dB is found to be necessary in the most demanding circumstances, requiring the reproduction of sound levels of up to 129 dB SPL. Present audio systems are shown to be challenged to yield these values.
....
A survey of the dynamic range capabilities of ADCs shows values of 90-110 dB, with the highest value for the best configurations of 20-bit word length converters,Analog Devices, Crystal Semiconductor, and Ultra Ana-log all make ADCs with dynamic ranges of 106-110 above 1 kHz. Unfortunately these values of dynamic-range performance are inadequate to meet the professional and most demanding of the consumer requirements, and techniques to increase the apparent dynamic-range characteristics are necessary."


Granted, not everyone requires such dynamic range. But if we are to establish what is an appropriate distribution specification, it better accommodate all that we can throw at it. One has to remember that there is no better customer of music than high-end buyer. They are the ones shelling out thousands and often tens of thousands of dollars in music. And are least apt to go and steal MP3s. So if you are going to set a standard, you better take good care of them.

Now, if you listen at a level above that, 16 bits is still enough, but then you have to involve simultaneous masking to show that it is. In other words, if you have music playing at 120 dB, it's going to severely degrade the sensitivity of your hearing. Not to mention -- as Ethan pointed out -- that finding a room where the background noise is below ATH is not exactly easy.
Not really. I am not sitting there listening to a tone at 120 db. A transient may just last a few milliseconds. Home theaters routinely hit 100+ db. THX spec for example requires 105 db. I am not seeing warning signs on such equipment saying you are going to go deaf.

Now in the old days of slow Internet and expensive hard disks, sure, we could argue these points and I used to do the same :). But technology and infrastructure has moved on. It is time that we don't short-change the customer knowingly in the interest of economizing for the sake of economizing. People are paying good money for music and like to feel, and be confident, that they are getting all the quality they could. Taking some of it out because we think they shouldn't have it doesn't make sense.
 
And on homes being quiet enough, again, I don't want a distribution format be my limitation. Let it be my room. And there, we have to be careful and not just look at a SPL meter in a room. Our ear doesn't have flat response. So when measuring room noise, we better look at its spectrum. From Fielder's paper

i-cLRJ9Wd-X3.png


So we see people already have rooms that are more quiet than we need.
 
IAnd that's about as broad and non-committal a statement as you're likely to get. Affected how? Under what testing conditions? If I came in here with an objectivist argument backed up by no more than that, it would be shot down in minutes. I'll look at the link later. Maybe there is something more there.

Tim
Tim, that is not the original article but an interpretation of the Japanese research which is published in an AES paper. I think this is what you were looking for:

"5.2 Results
As shown in Fig.11, the results were obtained which supported
with a high level of significance the perception of sound quality
differences caused by high frequency components. The influence
of high frequency components was indicated in several ways;
subjects perceived the presented sound as softer, more
reverbrative, with a better instrumental balance, and more
comfortable to the ear at a significance level of 1%, and more
rich in nuances at a significance level of 5% when high frequency
components were present."


And in the conclusions section:

"(3) The results from the subjective sound quality evaluation
experiment by Scheffe's method indicated that the music
containing high frequency components was perceived as more
pleasant and rich in nuance than music from which high frequency
components were eliminated."
 
I would be against another physical format as well but if Neil Young or Apple (or both) can make a good sounding file format of high resolution then I think that will likely take off.

Hirez benefits are real and substantial so I want to experience as much hirez as possible. I just love the feeling of getting "sucked into" the music and enjoying the performance and forgetting all about this gear stuff. That's the sign of good playback imho.

If you are not getting "sucked off" into the music, with 16/44.1, then 24/96 will not be some cure all.
 
Plotting 16-bit samples *with* proper dither against the threshold of hearing gets us this (from Bob Stuart's paper):

Bob is using unshaped (white) TPDF dither. Though it's one possible 'proper' dither in the mathematical sense, this isn't the dither a professional would use if there were alternatives; even several low-power 'improper' Gaussian dithers will outperform it in practice. That, and you easily gain 30dB+ at the midrange 'dip' with shaping, which Bob has't done. Most good dithers follow the ATH.

There's also a good case for using no dither at all. You get two bits back over Bob's figure.

[edit: actually, it just hit me that Bob's figure might be an invalid comparison-- his noise figures are a spectral density figure. He's overlayed them on a spectral amplitude plot of the ATH. Those aren't the same units even if you 'normalize' them. I'll think about it a bit more.]

In general, it is important to think of digital systems as a rectangle described by sampling rate on one axis, and bit depth in the other. We can borrow from one to help the other. Think of how SACD gets this done with just "one bit."

This is a sound analogy, and necessary for understanding all the different proposals Bob makes at the end of the paper. The argument is 'how big do we make the rectangle?' I'm arguing it is already large enough with plenty of fiddle room, and the experimental record backs it up.

Monty
Xiph.Org
 
Last edited:

About us

  • What’s Best Forum is THE forum for high end audio, product reviews, advice and sharing experiences on the best of everything else. This is THE place where audiophiles and audio companies discuss vintage, contemporary and new audio products, music servers, music streamers, computer audio, digital-to-analog converters, turntables, phono stages, cartridges, reel-to-reel tape machines, speakers, headphones and tube and solid-state amplification. Founded in 2010 What’s Best Forum invites intelligent and courteous people of all interests and backgrounds to describe and discuss the best of everything. From beginners to life-long hobbyists to industry professionals, we enjoy learning about new things and meeting new people, and participating in spirited debates.

Quick Navigation

User Menu