Giving it a quick look, seems like they've addressed a lot of the shortcomings o...

progval · 2025-10-02T15:57:29 1759420649

> - They got rid of the heavyweight compression and just stuck with the Delta/Dictionary/RLE stuff. Heavyweight compression never did anything anyway, and was super annoying to implement, and basically required you to pull in 20 dependencies.

"Heavyweight compression" as in zstd and brotli? That's very useful for columns of non-repeated strings. I get compression ratios in the order of 1% on some of those columns, because they are mostly ASCII and have lots of common substrings.

dan-robertson · 2025-10-02T10:13:32 1759400012

I think the wasm compiler is going to bring in more dependencies than the ‘heavy’ compression would have.

I think that more expensive compression may have made more of a difference 15 years ago when cpu was more plentiful compared to network or disk bandwidth.

MrBuddyCasino · 2025-10-02T07:32:01 1759390321

> I'm looking forward to this taking over the data analytics space

Parquet is surprisingly arcane. There are a lot of unpleasant and poorly documented details one has to be aware of in order to use it efficiently.

miohtama · 2025-10-02T06:47:15 1759387635

For compression, has the world settled with zstd now?

dan-robertson · 2025-10-02T10:04:35 1759399475

I think it’s a pretty common choice when you want compression in a new format or protocol. It works better for compressing chunks of your data rather than large files where you want to maintain some kind of index or random access. Similarly, if you have many chunks then you can parallelise decompression (I’m not sure any kind of parallelism support should have been built in to the zstd format, though it is useful for command line uses).

A big problem for some people is that Java support is hard as it isn’t portable so eg making a Java web server compress its responses with are isn’t so easy.

oblio · 2025-10-02T11:41:07 1759405267

Java can just use native libraries, there are plenty of Java projects that do that.

It's not like it's 1999 and there is still some Sun dogma against doing this.

dan-robertson · 2025-10-03T09:14:20 1759482860

Sure, I don't want to make a big deal about this but I have observed Java projects choosing to not support zstd for portability (or software packaging) reasons.

oblio · 2025-10-03T10:46:55 1759488415

Well, convenience is also a factor in some cases. Much easier to schlep a "pure-Java" jar around.

craftkiller · 2025-10-02T16:48:43 1759423723

Depends on the use-case. For transparent filesystem compression I would still recommend lz4 over zstd because speed matters more than compression ratio in that use case.

lionkor · 2025-10-02T08:58:13 1759395493

Most definitely not settled, but it's a good default

meehai · 2025-10-02T10:32:37 1759401157

https://stackoverflow.com/questions/31812780/append-a-new-co...

theamk · 2025-10-03T00:10:32 1759450232

They added wasm dependency though.

Build teams, weep in fear...