Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They mention 'content-defined chunking', but it as far as understand it requires different chunking algorithms for different content types. Does it support plugins for chunking different file formats?


Today we just have a variation of FastCDC in production, but we have alternate experimental chunkers for some file formats (ex: a heuristic chunker for CSV files that will enable almost free subsampling). Hope to have them enter production in the next 6 months.


That's interesting. Can a CSV chunker make adding a column not affect all of the chunks?


The simplest really is to chunk row-wise so adding columns will unfortunately rewrite all the chunks. If you have a parquet file, adding columns will be cheap.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: