Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Currently, the Git version used to produce the release tarballs (on the above-mentioned “clean” box) is too old to create reproducible .tar.gz tarballs, but it will create reproducible .tar.bz2 tarballs.

What is different about gzip and bzip2 that causes this?



Gzip just isn't specified well enough to be fully reproduceable. Gzip is technically only a file format specification, with multiple compression algorithms allowed.

Also Gzip includes metadata about the compression like a timestamp for when the file was compressed, and bits for which OS was in use when the file was compressed, etc. so the out put is never 100% the same, though it ought to be easy to work around that part.


Git 2.38.0 is the version where git archive uses an internal gzip implementation instead of calling the actual external gzip. This internal implementation has two improvements for this purpose: First, it doesn't store the timestamp. You could also get that with gzip -n. (But the old git archive didn't do that, so you have to run the gzip as a separate step after git archive.) Second, it stores the platform identification bits as "UNIX" on all platforms, so the output is identical on all platforms. There is no gzip command-line option for that, unfortunately.


Ah right. My suggestion would have been to use a small Perl or Python script or something, assuming there would be a zip library that would let you set the proper flags.


Presumably, the internal gzip lacks support for -n.

This could be fixed regardless of git version by calling external gzip after generating a plain tarball (or even decompressing the bzipped tarball).

Hmm, I wonder which approach would actually be fastest. Does cache contention break the obvious use of `tee(1)`?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: