Hacker Newsnew | past | comments | ask | show | jobs | submit | bokardo's commentslogin

Haha...I did write a response, but I tried to not be as baiting...in fact I hate the whole "horseshit" meme...can't we just respectfully debate?



Sorry about that...


Good point. Trials work well in some cases, not in others. For example, for products focused on startups, trials can be very effective. For products purchased by people other than those that use them, trials simply aren't used. So...knowing if purchasers = users is important.


Sorry for the site being down folks...my wordpress install is leaking serious memory somewhere and I'm restarting it every five minutes...


Hi, I'm one of the creators of http://www.abtests.com. The issue of statistical significance has come up over and over, so I'll try to explain our view of it.

We ask people to input their raw data...both trials and conversions. If they do this honestly (anybody can fake data about anything) then in our view the results speak for themselves. We've had folks upload data that was obviously not statistically significant, and we've had people write blog posts denouncing those results. We've also had folks upload test data that was statistically significant and people say they're learning a lot.

So we've had both solid and suspect data uploaded to the site with good discussion around it. This is exactly what we hoped for...I think in the future as more tests get uploaded the wheat will be separated from the chaff, so to speak, and those tests with significant data will get lots more attention than those that don't. In fact, we're already seeing this in the traffic logs.

And, as several folks have mentioned, many tools do the hard stats math for you, telling you when your data is statistically significant. This helps people know when they can be confident in sharing their data with others.


Doing the math here. A/B Tests with conversions are modeled as binomial variables. So the standard error of the conversions here is sqrt(p(1-p)/n) where p is conversion rate and n is number of hits (p(1-p) is standard deviation of binomial distribution). Calculating standard error for both of your versions - sqrt(0.002*(1-0.002)/2834) = 0.0008 and for the other SE is 0.0017. Now since there are large number of trials, you can model the difference of two binomial distributions as a normal distribution, standard deviation of whose is sqrt(se_1^2 + se_2^2) = 0.0019.

Now the way significance is checked is by using single tailed z score (we are testing if the difference in two distributions is statistically significant and greater than zero). Z score in this case is p_1 - p_2/std that is (0.008-0.002)/0.0019 = 3.1579 which is way larger than the critical value of 1.65 (which corresponds to 95% confidence).

So, the difference is indeed statistically significant. A note of caution is that some theory says that you cannot model a binomial distribution as a normal distribution until you have at least 10 successes or failures, which is the case here.


See my reply lower in the thread - I worked out the numbers using Bayesian inference to find the exact probability that B is better than A, subject to a number of assumptions. The benefit of this approach is that it's exact so you don't need a certain number of samples to properly approximate a normal distribution. The answer is that B is almost certainly better than A. Here's the calculation I plugged into Wolfram Alpha:

2835 2837 choose[2834,6] choose[2836,24] NIntegrate[(f^6) (1-f)^2828 (g^24) (1-g)^2812,{f,0,1},{g,f,1}]


Hey folks, thanks for the feedback. The reason why I titled the piece "slow erosion" is that this will take years...I know how valuable Google is for fact-based searches. But slowly, social search and in-context advertising will gain, not because Google will get worse, but because people will be starting from a different place. They'll start on Twitter or Facebook.

The key insight is that people trust others for certain types of information such as recommendations. A single recommendation from a friend is much more powerful than a list of restaurants Google will give you. "Hey, I know you and I know you will like this restaurant". You could argue that Google will eventually know a lot about us (they do already), but the fact is that social interaction trumps reference material in a lot of cases.

So this will probably take years, and many of the arguments against are that right now Google is better. No denying that...but have you seen the ads on Facebook lately? They're pretty stinky, but twice as good as they were even months ago.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: