Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love this article! But I think this insight shouldn't be surprising. Distribution always has overheads, so if you can do things on a single machine it will almost always be faster.

I think a lot of engineers expect 100 computers to be faster than 1, because of the size comparison. But we're really looking at a process here, and a process shifting data between machines will almost always have to do more stuff, and therefore be slower.

Where spark/daft are needed is if you have 1tb of data or something crazy were a single machine isn't viable. If I'm honest though, I've seen a lot of occasions where someone thinks they have that happening, and none so far where they actually do.



The Scalability at what COST paper (pdf https://www.usenix.org/system/files/conference/hotos15/hotos...) is my favorite thing. Single worker implementation wipes the floor with big distributed solutions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: