Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This looks really useful for managing datasets in things like finance or AI where we need to be able to show data provenance for regulatory or reproducibility reasons. Since it's implemented in Bash, I'd be curious about performance on larger datasets. I wonder if there would be any scalability improvements in moving to something like C or Carbon?


The performance is directly related to the number of files you track. The first time it runs, it will be slow for large datasets, but incremental changes can use the local cache to speed up things.

That said, I built the bash version as a proof-of-concept with the idea of implementing it using Zig once I'm happy with the tool's usability.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: