> Customer data sizes followed a power-law distribution. The largest customer had double the storage of the next largest customer, the next largest customer had half of that, etc
I’m no statistician, but I’m like 99% sure that’s an exponential, not a power law
There’s a world of difference. The point of an exponential is that you can ignore big things. The point of a power law is that you can’t.
>> Customer data sizes followed a power-law distribution. The largest customer had double the storage of the next largest customer, the next largest customer had half of that, etc
> I’m no statistician, but I’m like 99% sure that’s an exponential, not a power law
The thesis of the article is plausible, but I think it’s missing the broader perspective.
It makes sense that science picks “low hanging fruit” early on. Then, on average, later discoveries require more effort.
But the rate of progress depends on both how much effort a discovery takes AND how much effort is available. The progress of society has made it possible to aim exponentially more resources at solving problems. Computers let us automate things. Medicine & farming mean more humans can do higher value work. Better politics means fewer people dying in wars.
So I don’t care if each discovery takes more effort than the last. As long as we get exponentially more resources to go along with it, we can keep creating exponentially more knowledge for a looong time.
Not as far as I know. The primary users & contributors are apple and snowflake. It was only made OSS a few years ago, and there were already a number of comparable alternatives (scalable, consistent OLTP like Cockroach, Yugabyte, Fauna) with companies behind them.
I thought the edges represented the strength of relationships between teammates. As people get to know each other, people settle into roles and communication patterns, worrying less about social status, complementing each other, making decisions faster, and anticipating each other. Time together makes good a team more effective.
“Powered Twitter” is an overstatement. Storm was indeed used at Twitter for select streaming use cases, but it was a bit of a mess and ended up being rewritten from the ground up for 10x improvements in latency and throughout [1]. Marz was at the company for < 2 years. Lately, Twitter has been moving data processing use cases to GCP [2].
Storm is also not very well regarded in the stream processing community due to its restrictive model, poor guarantees, and abysmal performance [3].
I have nothing against Marz, but I do think skepticism is warranted until we see what they’ve built.
For reference, TNT is 1kcal/g. This is 6.2 kcal/g.