Ask HN: Determine sound quality of a file, objectively?

bheadmaster · on Aug 19, 2024

> I sometimes come across mp3 files with a high bitrate, but they sound bad which suggest that they were re-encoded from a bad/low-bitrate source.

You could try re-compressing the mp3 file to lower and lower bitrates and check the amplitude of differences. Since mp3 is a lossy codec, there will always be a slight difference, but you should see a sudden increase in difference when you surpass the "true" encoding bitrate.

You could probably write a script for it using ffmpeg and some other tools to generate a bitrate-difference chart.

bheadmaster · on Aug 19, 2024

Okay, I got nerd sniped [0]. Here's the script I've made:

    https://gist.github.com/paskozdilar/6095fe73c80ad21fda3f518177699149

This isn't an exact method, so it will only print bitrates and their respective "difference" parameter. In general, as the difference parameter rises, you're more likely to have reached the "true" mp3 quality, and a sudden jump in difference parameter is an almost certain indicator of true quality.

E.g. for a System Of A Down song "36", encoded at 320kbps, I get the following output:

    user@hostname:~/temp$ ./main.sh 36.mp3
    320: 0.075272
    256: 0.121475
    224: 0.160858
    192: 0.193726
    160: 0.237717
    128: 0.308029
    112: 0.363953
     96: 0.409012
     80: 0.454941
     64: 0.578598
     56: 1.105850
     48: 1.081100
     40: 0.629898
     32: 1.223129

Here, we have a jump immediately after 320kbps, which shows that this file is true 320kbps.

When I compressed the song to 64kbps and then re-compressed it to 320kbps, I get this result:

    user@hostname:~/temp$ ./main.sh 36-reencoded.mp3 
    320: 0.146484
    256: 0.146484
    224: 0.146484
    192: 0.146484
    160: 0.146484
    128: 0.146957
    112: 0.178665
     96: 0.210373
     80: 0.175171
     64: 0.211609
     56: 1.054886
     48: 0.944687
     40: 0.505280
     32: 0.852020

As we can see, the most significant jump happens from 64kbps to 56kbps, which confirms that this file's true bitrate is indeed 64kbps.

Though, as sibling comment says, it's not confirmed that this kind of process works across encoders. I think that it should, because MP3 as a lossy codec mostly removes higher frequencies, and re-encoding compressed signal with same bitrate should remove less higher frequencies, because there are less to begin with. But I have no way to confirm - I'd need an array of encoders to actually verify this.

[0] https://xkcd.com/356/

tambourine_man · on Aug 19, 2024

What an awesome thread. Someone posts a reasonable assumption, another one quickly validates it with a simple script. This is why I love Hacker News.

waterhouse · on Aug 19, 2024

If you check the usernames, the one who suggested and the one who followed up are the same person. :-)

tambourine_man · on Aug 19, 2024

OMG, you’re right. They nerdsnipped themselves.

water-data-dude · on Aug 19, 2024

A tale as old as time, hahaha

qwertygnu · on Aug 19, 2024

Looks like there's a pretty big jump at 56 in the first version as well.

bheadmaster · on Aug 19, 2024

Seems that way. Probably because I'm using Maximum delta of the difference as the metric. Using some intergral-like function (e.g. average amplitude of all samples) would probably yield better results, but I have a hard time getting any useful data out of this script on my home computer. Weird.

I'll possibly work more on this topic, maybe I'll make a HN post.

geor9e · on Aug 19, 2024

That sounds right. It should go from all small steps to all big steps. Every step past the real bitrate is shucking data humans can't hear well, more and more aggressively. Every step before the real bitrate is looking for data to shuck but not finding it.

I just realized, social media sites may do something similar to save space. I've noticed whenever I upscale my video resolution, they aren't fooled. At first the video will be hosted in 4k, but later that option disappears, and only the 1080P and lower resolutions are left. But sharp footage from my real 4k camera stays hosted at 4k. I figured they must take a screenshot and determine the blurriness, but now I think they might just look at those re-encoding stats they already have.

ggeorgovassilis · on Aug 19, 2024

That's more than I ever could have asked for. Amazing work, thank you so much! What's the script's license?

bheadmaster · on Aug 21, 2024

> What's the script's license?

For legal purposes, let's say GNU All-permissive License. I promise not to sue :')

hgomersall · on Aug 19, 2024

Does this work if the encoders are different?

Are encoders aware that you are transcoding rather than working on raw waveform data? Presumably if working on waveform data the higher bitrates will seek to maintain the artifacts due to the earlier transcoding. Can one rely this much on the behaviour of an encoder?

bheadmaster · on Aug 19, 2024

> Does this work if the encoders are different?

I have no idea, honestly.

I've tested my script (from a sibling comment of yours) with twolame [0] encoder, and I don't know if the results are good enough to indicate 64kbps true bitrate:

    user@host:~/temp$ ./main.sh 36-twolame-reencoded.mp3 
    320: 0.032303
    256: 0.045563
    224: 0.059769
    192: 0.074631
    160: 0.204269
    128: 0.229187
    112: 0.309540
     96: 0.302414
     80: 0.310181
     64: 0.367279
     56: 1.133713
     48: 1.106262
     40: 0.444473
     32: 1.030777

[0] https://www.twolame.org/

Joel_Mckay · on Aug 19, 2024

Not all mp3 codecs are the same, and some include a feature called perceptual compression. Thus, the predominate features of the audio one actually hears are preserved over high-frequency overtones most people find unpleasant. Thus, the self-similarity of the small file is improved, while the perceived sound quality is better at lower bit-rates (note not all psychoacoustic models are the same).

This is why some encoders sound like garbage even at relatively high encoded quality levels.

Lets be clear, most people can't hear the difference (e.x. digital synth music makes smaller files with duplicate audio), and others simply adapt to poor sound quality given they hit the $2.34 hardware mixer chip interpolation limitations long before the audio codecs limits (e.x. Bluetooth can be really iffy).

Don't worry about it, and maybe pick up a stringed instrument if you want a quality experience =3

sgarland · on Aug 19, 2024

A private music tracker called REDacted has a good overview [0] on this subject (link should be SFW, and is on a different domain than the tracker itself). What.CD, before its demise, was the first I had seen demonstrating this, but older trackers (OiNK, Waffles.fm, et al.) may have also had it.

Objectivity aside, IMO the easiest way to tell when listening is paying attention to high-frequency sounds, especially hi-hat cymbals. Unless lossy encoders have gotten remarkably better since I last tried, there’s always a marked loss of shimmer / reverb on those.

[0]: https://interviewfor.red/en/spectrals.html

Kirby64 · on Aug 19, 2024

Spectrals are easily the best method. It’ll give you a subjective idea, but it’s never going to be truly perfect with lossy encodes. If you’re looking at FLAC then you can tell pretty easily if something is a lossy transcode, but otherwise knowing if something is 192kbps vs 128kbps is going to be difficult. For really low bit rate reencoded as 320 that should be obvious on the spectrals, but it won’t tell you anything definitive beyond “well, it was a much worse bit rate previously”… which is useful, but only to a point.

ggeorgovassilis · on Aug 19, 2024

Great post, thank you! I was expecting to see frequency blocks because of quantization, but I'm positively surprised that there are MP3 compression tell-tales like the cut-off and gaps.

hengheng · on Aug 19, 2024

https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Audio...

PEAQ is an algorithm and scoring system that takes psychoacoustic modeling into account. When I looked into this more than ten years ago, I managed to find a command line utility called pqevalaudio or something that I could just use to assign a score to a file.

ajb · on Aug 19, 2024

Normally those algorithms expect a clean reference version of the audio, as they quantify impairment. It doesn't sound like the OP has a reference version.

threatripper · on Aug 19, 2024

Is it this one? https://github.com/sungminlee114/PEAQ_matlab

water-data-dude · on Aug 20, 2024

Interesting.

The README was pretty sparse and I wanted to read more about what was actually going on, so I poked around a bit. First of all, wikipedia article if you just want a birds eye view [0].

The website the README links to as the original isn’t secure, but you can read the paper talking about this particular way of measuring perceptual audio quality here if you don’t mind living a little dangerously[1]. I also found this paper, which I haven’t had a chance to read yet, but it gives a “zoomed out” view, looking at a bunch of different perceptual audio quality measures [2].

[0] https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Audio...

[1] https://www-mmsp.ece.mcgill.ca/Documents/Reports/2002/KabalR...

[2] https://dl.acm.org/doi/pdf/10.1109/TASLP.2021.3069302 https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Audio...

comprev · on Aug 19, 2024

Try Spek [0]

It's a popular tool for verifying the quality of music downloaded via P2P platforms. Re-encoding YouTube rips is popular - especially bootleg tracks - and this helps weed them out.

[0] http://help.spek.cc/

lozf · on Aug 19, 2024

Spectrograms are not always a reliable way to judge audio quality of lossy-encoded files.

Lossy codecs use psycho-acoustic models with are designed to save space by discarding data that's unlikely to be missed by humans. - Often frequencies that would be masked by other sounds.

This process often results in what looks like 'holes' in the spectrogram, if compared to a lossless 'smooth' version.

Most people then (wrongly) assume that he file then sounds bad, when actually the bits saved there may well have been used to better encode part of the spectrum that human ears would notice.

It's quite possible to generate a great looking spectrogram that yields a poor sounding file and vice-versa.

Lossy encodes *always* cause some further degradation of audio-quality with each encode so it's best to leave lossy files as is, and always choose the least processed version. (e.g the 128kbps AAC (.m4a) from youtube will always be better than any MP3 version of the same source).

Audacity, SoX , & FFmpeg are also capable of producing more accurate spectrograms than Spek (with a steeper learning curve of course).

Much more informations on this and related topics at Hydrogen Audio[0]

[0] https://hydrogenaud.io

input_sh · on Aug 19, 2024

More user-friendly option, can scan the whole library instead of having to load files one-by-one into Spek: https://fakinthefunk.net/en/index.html

lbotos · on Aug 19, 2024

+1 for spek, used it frequently in the past, very clear to see 320 re-rips that were actually lower quality

andrei-akopian · on Aug 19, 2024

Why did you link HELP.spek.cc specifically? It appears to be an unsecured (http) outdated page with older (version 8.2/8.3) downloads.

But just spek.cc (liked on the github) is a substack site (odd choice for a project homepage) with version 8.5 downloads.

Github: https://github.com/alexkay/spek Substack: https://www.spek.cc/p/download

mchinen · on Aug 19, 2024

Most of the work on objective quality metrics (e.g. PESQ, POLQA, ViSQOL, DNS-MOS, NISQA) focus on speech because of telecommunications demands, but some of these have an audio mode. But there are some new promising audio ones that are ML based.

I haven't tried it but you may want to look into PAM, which is relatively new and doesn't require a reference (you don't need the original uncompressed audio), and is open source.

However, all approaches are quite far from perfect. Human evaluation is still the gold standard.

Retr0id · on Aug 19, 2024

All the truly objective metrics will require a reference file to compare against. If you manage to find such a reference file, you've kinda solved your initial problem (which, presumably, was wanting to find the best quality file to listen to).

jononor · on Aug 19, 2024

Objective metrics for audio quality estimation is a tool commonly used to develop audio products, from mobile phones to headphones, hearing aids and audio codecs. I worked briefly in that area some years ago, and have put some notes together at: https://github.com/jonnor/machinehearing/blob/master/audio-q... In your usecases you would want a metric that only needs the single audio clip, without an original/pristine reference. This is called a "reference-free" / "non-intrusive" / "single-ended" metric.

Detecting re-encoding or double encoding is sometimes researched, though mostly for audio forensic purposes.

Conceptually it would be possible to use the encoding with different codecs and nitrates/settings on a sizable corpus of music, to learn a ML model that can learn to identify the "true" bitrate on new unseen audio clips.

WhiteOwlLion · on Aug 23, 2024

https://friture.org/

spectral analyzer. not sure if you need cli or batch function, but the frequency will be cut off regardless of the purported bitrate even if it was "upscaled" since those frequencies were chopped previously. you can see a sample screenshot in the upper left showing the frequency. re-encode a 320kbps to 128kbps and you can see the frequency range diminished on the 128kbps.

Madeindjs · on Aug 19, 2024

I found a tool named [Lossless Audio Checker](https://losslessaudiochecker.com) : "A utility to check whether a WAVE or FLAC file is truly lossless or not".

I was so sad that this project is not open-source but their Research papers give some interesting clues about detecting bad quality files.

On my side, I used it through a [Bash script](https://gist.github.com/madeindjs/d5e3949313b141f2e5eea62b98...) to detect bad files in my library. The tool produces a lot of false positives since it triggered on some High Res audio musics I bought on Qobuz.

lmpdev · on Aug 19, 2024

Not my area, but:

You could use image processing/DSP methods on a sample of spectrogram images taken from the file

Visibly it’s obvious when it’s compressed, you get “glitchy” or “smeary” repeated artefacts

I’d also look for cuts on the high end (over ~15k hz) that clip more than normal (compared to uncompressed)

gwbas1c · on Aug 19, 2024

https://andrewrondeau.com/blog/2016/07/deconstructing-lossy-...

Years ago, I did a study. I wrote a program to compare the original to the encoded version of a file. I used high-resolution DVD-A rips, to try to avoid artifacts introduced by downsampling the master to CD resolution.

The source code that I used for the above article is at: https://github.com/GWBasic/MeasureDegredation

dotancohen · on Aug 19, 2024

I asked almost this exact question on Super User (part of the Stack Exchange network) a few years ago. Some of the answers are still relevant:

  How to objectively compare the sound quality of two files?

https://superuser.com/questions/693238/how-to-objectively-co...

hiddencost · on Aug 19, 2024

Worth keeping in mind that this varies wildly per domain, and there are many different measures for different types of problems.

SNR is a classic, and simple enough to give you an intuitive sense for the underlying signal processing.

https://essentia.upf.edu/reference/std_SNR.html

corytheboyd · on Aug 19, 2024

Not an answer, just got me thinking about my _subjective_ measures of “bad music quality”. It’s not as trivial as “there is clipping”, because some elements clip on purpose, but if a vocal part clips it’s an instant turn off for me, it sounds horrible. That’s not a data issue though, it’s a production issue… unless the data is exceptionally bad.

animuchan · on Aug 19, 2024

A track which uses low-bitrate mp3 compression as an effect, specifically to get the ringing artifacts on some of the instruments, would make a fun benchmark for any quality detection method.

lsferreira42 · on Aug 19, 2024

What an awsome thread and answers, i have no personal interest on the subject but i had to read all the answers and even try a script a guy did here!!!

butterknife · on Aug 19, 2024

Listen to the the decoded Side channel. The more artefacts you hear the less bitrate encoding was used.

Compare to known good.

2OEH8eoCRo0 · on Aug 19, 2024

What is quality? The smallest resolvable detail?

wyldfire · on Aug 19, 2024

In this case it seems that the question is referring to fidelity - the file's similarity to the original uncompressed audio after it was sampled and mastered but before the lossy compression.

I would think that you could use something like RMSE [1] to measure this, but I'm not experienced in this kind of thing.

[1] https://en.wikipedia.org/wiki/Root_mean_square_deviation

_a_a_a_ · on Aug 19, 2024

Answers here show HN at its finest.

jmmcd · on Aug 19, 2024

Yes. But this is also the kind of thing that should be googled, or queried to an LLM, as a first pass.

ttpphd · on Aug 19, 2024

Why do you feel the need to police people asking technical questions on HN under a "Ask HN" tag?

umanwizard · on Aug 19, 2024