Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Determine sound quality of a file, objectively?
65 points by ggeorgovassilis on Aug 19, 2024 | hide | past | favorite | 45 comments
Is there a way to "objectively" determine the sound quality of an audio file? Eg. I sometimes come across mp3 files with a high bitrate, but they sound bad which suggest that they were re-encoded from a bad/low-bitrate source. By "objectively" I mean something else than listening to them, eg. an audio spectogram etc?


> I sometimes come across mp3 files with a high bitrate, but they sound bad which suggest that they were re-encoded from a bad/low-bitrate source.

You could try re-compressing the mp3 file to lower and lower bitrates and check the amplitude of differences. Since mp3 is a lossy codec, there will always be a slight difference, but you should see a sudden increase in difference when you surpass the "true" encoding bitrate.

You could probably write a script for it using ffmpeg and some other tools to generate a bitrate-difference chart.


Okay, I got nerd sniped [0]. Here's the script I've made:

    https://gist.github.com/paskozdilar/6095fe73c80ad21fda3f518177699149
This isn't an exact method, so it will only print bitrates and their respective "difference" parameter. In general, as the difference parameter rises, you're more likely to have reached the "true" mp3 quality, and a sudden jump in difference parameter is an almost certain indicator of true quality.

E.g. for a System Of A Down song "36", encoded at 320kbps, I get the following output:

    user@hostname:~/temp$ ./main.sh 36.mp3
    320: 0.075272
    256: 0.121475
    224: 0.160858
    192: 0.193726
    160: 0.237717
    128: 0.308029
    112: 0.363953
     96: 0.409012
     80: 0.454941
     64: 0.578598
     56: 1.105850
     48: 1.081100
     40: 0.629898
     32: 1.223129
Here, we have a jump immediately after 320kbps, which shows that this file is true 320kbps.

When I compressed the song to 64kbps and then re-compressed it to 320kbps, I get this result:

    user@hostname:~/temp$ ./main.sh 36-reencoded.mp3 
    320: 0.146484
    256: 0.146484
    224: 0.146484
    192: 0.146484
    160: 0.146484
    128: 0.146957
    112: 0.178665
     96: 0.210373
     80: 0.175171
     64: 0.211609
     56: 1.054886
     48: 0.944687
     40: 0.505280
     32: 0.852020
As we can see, the most significant jump happens from 64kbps to 56kbps, which confirms that this file's true bitrate is indeed 64kbps.

Though, as sibling comment says, it's not confirmed that this kind of process works across encoders. I think that it should, because MP3 as a lossy codec mostly removes higher frequencies, and re-encoding compressed signal with same bitrate should remove less higher frequencies, because there are less to begin with. But I have no way to confirm - I'd need an array of encoders to actually verify this.

[0] https://xkcd.com/356/


What an awesome thread. Someone posts a reasonable assumption, another one quickly validates it with a simple script. This is why I love Hacker News.


If you check the usernames, the one who suggested and the one who followed up are the same person. :-)


OMG, you’re right. They nerdsnipped themselves.


A tale as old as time, hahaha


Looks like there's a pretty big jump at 56 in the first version as well.


Seems that way. Probably because I'm using Maximum delta of the difference as the metric. Using some intergral-like function (e.g. average amplitude of all samples) would probably yield better results, but I have a hard time getting any useful data out of this script on my home computer. Weird.

I'll possibly work more on this topic, maybe I'll make a HN post.


That sounds right. It should go from all small steps to all big steps. Every step past the real bitrate is shucking data humans can't hear well, more and more aggressively. Every step before the real bitrate is looking for data to shuck but not finding it.

I just realized, social media sites may do something similar to save space. I've noticed whenever I upscale my video resolution, they aren't fooled. At first the video will be hosted in 4k, but later that option disappears, and only the 1080P and lower resolutions are left. But sharp footage from my real 4k camera stays hosted at 4k. I figured they must take a screenshot and determine the blurriness, but now I think they might just look at those re-encoding stats they already have.


That's more than I ever could have asked for. Amazing work, thank you so much! What's the script's license?


> What's the script's license?

For legal purposes, let's say GNU All-permissive License. I promise not to sue :')


Does this work if the encoders are different?

Are encoders aware that you are transcoding rather than working on raw waveform data? Presumably if working on waveform data the higher bitrates will seek to maintain the artifacts due to the earlier transcoding. Can one rely this much on the behaviour of an encoder?


> Does this work if the encoders are different?

I have no idea, honestly.

I've tested my script (from a sibling comment of yours) with twolame [0] encoder, and I don't know if the results are good enough to indicate 64kbps true bitrate:

    user@host:~/temp$ ./main.sh 36-twolame-reencoded.mp3 
    320: 0.032303
    256: 0.045563
    224: 0.059769
    192: 0.074631
    160: 0.204269
    128: 0.229187
    112: 0.309540
     96: 0.302414
     80: 0.310181
     64: 0.367279
     56: 1.133713
     48: 1.106262
     40: 0.444473
     32: 1.030777
[0] https://www.twolame.org/


Not all mp3 codecs are the same, and some include a feature called perceptual compression. Thus, the predominate features of the audio one actually hears are preserved over high-frequency overtones most people find unpleasant. Thus, the self-similarity of the small file is improved, while the perceived sound quality is better at lower bit-rates (note not all psychoacoustic models are the same).

This is why some encoders sound like garbage even at relatively high encoded quality levels.

Lets be clear, most people can't hear the difference (e.x. digital synth music makes smaller files with duplicate audio), and others simply adapt to poor sound quality given they hit the $2.34 hardware mixer chip interpolation limitations long before the audio codecs limits (e.x. Bluetooth can be really iffy).

Don't worry about it, and maybe pick up a stringed instrument if you want a quality experience =3


A private music tracker called REDacted has a good overview [0] on this subject (link should be SFW, and is on a different domain than the tracker itself). What.CD, before its demise, was the first I had seen demonstrating this, but older trackers (OiNK, Waffles.fm, et al.) may have also had it.

Objectivity aside, IMO the easiest way to tell when listening is paying attention to high-frequency sounds, especially hi-hat cymbals. Unless lossy encoders have gotten remarkably better since I last tried, there’s always a marked loss of shimmer / reverb on those.

[0]: https://interviewfor.red/en/spectrals.html


Spectrals are easily the best method. It’ll give you a subjective idea, but it’s never going to be truly perfect with lossy encodes. If you’re looking at FLAC then you can tell pretty easily if something is a lossy transcode, but otherwise knowing if something is 192kbps vs 128kbps is going to be difficult. For really low bit rate reencoded as 320 that should be obvious on the spectrals, but it won’t tell you anything definitive beyond “well, it was a much worse bit rate previously”… which is useful, but only to a point.


Great post, thank you! I was expecting to see frequency blocks because of quantization, but I'm positively surprised that there are MP3 compression tell-tales like the cut-off and gaps.


https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Audio...

PEAQ is an algorithm and scoring system that takes psychoacoustic modeling into account. When I looked into this more than ten years ago, I managed to find a command line utility called pqevalaudio or something that I could just use to assign a score to a file.


Normally those algorithms expect a clean reference version of the audio, as they quantify impairment. It doesn't sound like the OP has a reference version.



Interesting.

The README was pretty sparse and I wanted to read more about what was actually going on, so I poked around a bit. First of all, wikipedia article if you just want a birds eye view [0].

The website the README links to as the original isn’t secure, but you can read the paper talking about this particular way of measuring perceptual audio quality here if you don’t mind living a little dangerously[1]. I also found this paper, which I haven’t had a chance to read yet, but it gives a “zoomed out” view, looking at a bunch of different perceptual audio quality measures [2].

[0] https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Audio...

[1] https://www-mmsp.ece.mcgill.ca/Documents/Reports/2002/KabalR...

[2] https://dl.acm.org/doi/pdf/10.1109/TASLP.2021.3069302 https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Audio...


Try Spek [0]

It's a popular tool for verifying the quality of music downloaded via P2P platforms. Re-encoding YouTube rips is popular - especially bootleg tracks - and this helps weed them out.

[0] http://help.spek.cc/


Spectrograms are not always a reliable way to judge audio quality of lossy-encoded files.

Lossy codecs use psycho-acoustic models with are designed to save space by discarding data that's unlikely to be missed by humans. - Often frequencies that would be masked by other sounds.

This process often results in what looks like 'holes' in the spectrogram, if compared to a lossless 'smooth' version.

Most people then (wrongly) assume that he file then sounds bad, when actually the bits saved there may well have been used to better encode part of the spectrum that human ears would notice.

It's quite possible to generate a great looking spectrogram that yields a poor sounding file and vice-versa.

Lossy encodes *always* cause some further degradation of audio-quality with each encode so it's best to leave lossy files as is, and always choose the least processed version. (e.g the 128kbps AAC (.m4a) from youtube will always be better than any MP3 version of the same source).

Audacity, SoX , & FFmpeg are also capable of producing more accurate spectrograms than Spek (with a steeper learning curve of course).

Much more informations on this and related topics at Hydrogen Audio[0]

[0] https://hydrogenaud.io


More user-friendly option, can scan the whole library instead of having to load files one-by-one into Spek: https://fakinthefunk.net/en/index.html


+1 for spek, used it frequently in the past, very clear to see 320 re-rips that were actually lower quality


Why did you link HELP.spek.cc specifically? It appears to be an unsecured (http) outdated page with older (version 8.2/8.3) downloads.

But just spek.cc (liked on the github) is a substack site (odd choice for a project homepage) with version 8.5 downloads.

Github: https://github.com/alexkay/spek Substack: https://www.spek.cc/p/download


Most of the work on objective quality metrics (e.g. PESQ, POLQA, ViSQOL, DNS-MOS, NISQA) focus on speech because of telecommunications demands, but some of these have an audio mode. But there are some new promising audio ones that are ML based.

I haven't tried it but you may want to look into PAM, which is relatively new and doesn't require a reference (you don't need the original uncompressed audio), and is open source.

However, all approaches are quite far from perfect. Human evaluation is still the gold standard.


All the truly objective metrics will require a reference file to compare against. If you manage to find such a reference file, you've kinda solved your initial problem (which, presumably, was wanting to find the best quality file to listen to).


Objective metrics for audio quality estimation is a tool commonly used to develop audio products, from mobile phones to headphones, hearing aids and audio codecs. I worked briefly in that area some years ago, and have put some notes together at: https://github.com/jonnor/machinehearing/blob/master/audio-q... In your usecases you would want a metric that only needs the single audio clip, without an original/pristine reference. This is called a "reference-free" / "non-intrusive" / "single-ended" metric.

Detecting re-encoding or double encoding is sometimes researched, though mostly for audio forensic purposes.

Conceptually it would be possible to use the encoding with different codecs and nitrates/settings on a sizable corpus of music, to learn a ML model that can learn to identify the "true" bitrate on new unseen audio clips.


https://friture.org/

spectral analyzer. not sure if you need cli or batch function, but the frequency will be cut off regardless of the purported bitrate even if it was "upscaled" since those frequencies were chopped previously. you can see a sample screenshot in the upper left showing the frequency. re-encode a 320kbps to 128kbps and you can see the frequency range diminished on the 128kbps.


I found a tool named [Lossless Audio Checker](https://losslessaudiochecker.com) : "A utility to check whether a WAVE or FLAC file is truly lossless or not".

I was so sad that this project is not open-source but their Research papers give some interesting clues about detecting bad quality files.

On my side, I used it through a [Bash script](https://gist.github.com/madeindjs/d5e3949313b141f2e5eea62b98...) to detect bad files in my library. The tool produces a lot of false positives since it triggered on some High Res audio musics I bought on Qobuz.


Not my area, but:

You could use image processing/DSP methods on a sample of spectrogram images taken from the file

Visibly it’s obvious when it’s compressed, you get “glitchy” or “smeary” repeated artefacts

I’d also look for cuts on the high end (over ~15k hz) that clip more than normal (compared to uncompressed)


https://andrewrondeau.com/blog/2016/07/deconstructing-lossy-...

Years ago, I did a study. I wrote a program to compare the original to the encoded version of a file. I used high-resolution DVD-A rips, to try to avoid artifacts introduced by downsampling the master to CD resolution.

The source code that I used for the above article is at: https://github.com/GWBasic/MeasureDegredation


I asked almost this exact question on Super User (part of the Stack Exchange network) a few years ago. Some of the answers are still relevant:

  How to objectively compare the sound quality of two files?  
https://superuser.com/questions/693238/how-to-objectively-co...


Worth keeping in mind that this varies wildly per domain, and there are many different measures for different types of problems.

SNR is a classic, and simple enough to give you an intuitive sense for the underlying signal processing.

https://essentia.upf.edu/reference/std_SNR.html


Not an answer, just got me thinking about my _subjective_ measures of “bad music quality”. It’s not as trivial as “there is clipping”, because some elements clip on purpose, but if a vocal part clips it’s an instant turn off for me, it sounds horrible. That’s not a data issue though, it’s a production issue… unless the data is exceptionally bad.


A track which uses low-bitrate mp3 compression as an effect, specifically to get the ringing artifacts on some of the instruments, would make a fun benchmark for any quality detection method.


What an awsome thread and answers, i have no personal interest on the subject but i had to read all the answers and even try a script a guy did here!!!


Listen to the the decoded Side channel. The more artefacts you hear the less bitrate encoding was used.

Compare to known good.


What is quality? The smallest resolvable detail?


In this case it seems that the question is referring to fidelity - the file's similarity to the original uncompressed audio after it was sampled and mastered but before the lossy compression.

I would think that you could use something like RMSE [1] to measure this, but I'm not experienced in this kind of thing.

[1] https://en.wikipedia.org/wiki/Root_mean_square_deviation


Answers here show HN at its finest.


Yes. But this is also the kind of thing that should be googled, or queried to an LLM, as a first pass.


Why do you feel the need to police people asking technical questions on HN under a "Ask HN" tag?


Why?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: