Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here are some examples of what I was doing in one case

https://www.hydrogen18.com/blog/apk-the-strangest-format.htm...

I was running "zstd --ultra --threads=0" which I assumed was asking it for the absolute maximum



I think your mistake was to use --ultra without a compression level.

I redid your experiments with rust-wasm-1.83.0-r0.apk:

                            size       perc   c.time  d.time
    uncompressed:      290072064          -        -
    gzipped original:  105255109     36.29%        -  
    bzip2 -9:          107099379     36.92%    21.1s  11.0s
    bzip3 -b511:        73539847     25.35%    28.9s  32.0s
    xz --extreme -9:    71010672     24.48%   142.0s   3.1s
    lzip -9:            70964413     24.46%   173.5s   5.3s
    zstd --ultra -22:   48288499     16.64%   155.6s   0.4s
It's pretty clear zstd blows everything else out of the water by a huge margin. And even though compressing with zstd is slightly slower than xz in this case (by less than 10%), decompression is nearly 8x as fast, and you can probably tweak the compression level to make zstd be both faster and better than xz.


That was an impressive result, so I tried it on a huge email inbox.

    uncompressed:    1512662084
    xz --extreme -9:  508431572  12:47
    zstd --ultra -21: 508432560  12:44
(-22 ran out of memory.) So at least by me zstd was identical to xz almost to the byte and the second.


It does really vary based on the data set.

If the email data is mostly text with markup (like HTML/XML), you might want to try bzip3 too.

It's also possible that a large part of your email is actually already-compressed binary data (like PDFs and images) possibly encoded in base-64. In that case it's likely that all tools are pretty good at compressing the text and headers, but can do little to compress the attachments, which would explain why the results you get are so close.


    bzip3 -b511: 580771424  8:51
I suspect your theory about compressed attachments is correct, although bzip3 isn't doing very well compared to the rest.


Interesting--thanks for checking! I had good experiences with bzip3 compressing Wikipedia XML dumps, to the point it even outperformed xz, so I thought something similar might happen here. Compression does remain a bit of a black art, where it's hard to predict what works without trying it out.

Overall I'm still slightly biased towards using zstd as a default, in that I believe:

  1. zstd will almost always be among fastest formats for decompression, which is obviously nice-to-have everything else being equal.
  2. zstd can achieve a very high compression ratio, depending on tuning; rarely will zstd significantly underperform the next best option.
Overall this is a pretty good case for using zstd by default, even if in some cases it's not noticably better than other formats. In your case, xz seems to be just as good.


I got -22 to run:

    zstd --ultra -22: 494517545 14:00
Pretty minor difference.


I guess I misunderstood the man page for that option then.


yup, you should have tried just different -NN, and notice. I had a talk on zstd couple of years back, and one of the points was that it was better than xz across the board.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: