Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[dead]
66 days ago | hide | past | favorite


Hi HN, I'm Dennis from Greptime. This article is based on a talk by our engineer Ruihang Xia, who is also a PMC member of Apache DataFusion.

The most surprising finding for me was the hash seed trick - using the same random seed across HashMaps in a two-phase aggregation gives you ~10% speedup on ClickBench. The bucket distribution from the first phase can be preserved during merge, eliminating rehashing overhead and making CPU cache happy.

We also discuss why Rust's prost library can be significantly slower than Go's protobuf implementation, and how fixing it improved our end-to-end throughput by 40%.

Happy to discuss Rust performance optimization or DataFusion internals.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: