Parquet files include a field called key_value_metadata in the FileMetadata stru...

alamb · 2025-07-15T11:01:36 1752577296

> Note that the readers of Parquet need to be aware of any metadata to exploit it. But if not, nothing changes

The one downside of this approach, which is likely obvious, but I haven't seen mentioned is that the resulting parquet files are larger than they would be otherwise, and the increased size only benefits engines that know how to interpret the new index

(I am an author)

SiempreViernes · 2025-07-14T20:22:50 1752524570

So, can we take that as a "no"?

gazpacho · 2025-07-15T02:11:49 1752545509

There is no spec. Personally I hope that the existing indexes (bloom filters, zone maps) get re-designed to fit into a paradigm where parquet itself has more first class support for multiple levels of indexes embedded in the file and conventions for how those common types. That is, start with Wild West and define specs as needed

alamb · 2025-07-15T10:57:29 1752577049

> That is, start with Wild West and define specs as needed

Yes this is my personal hope as well -- if there are new index types that are widespread, they can be incorporated formally into the spec

However, changing the spec is a non trivial process and requires significant consensus and engineering

Thus the methods used in the blog can be used to use indexes prior to any spec change and potentially as a way to prototype / prove out new potential indexes

(note I am an author)