Spark is significantly more efficient than Hadoop. I don’t know about your speci...

Spark is significantly more efficient than Hadoop.

I don’t know about your specific workload, but i’ve seen quite a few Hadoop setups that were at 100% load most of the time, and were replaced by relatively simple non Hadoop based code that used 2% to 10% of the hardware and ran about as fast.

I didn’t spend much time evaluating the “pre”, but at least one workload spent 90% of the 100% on [de]serialization.

It’s not my link, it is Frank McSherry who is commenting in this thread - I hope he can chime in on why he chose this specific example - but it correlates very well with my experience.