Hacker Newsnew | past | comments | ask | show | jobs | submit | more usgroup's commentslogin

I think that's overly reductivist. In the general case DS operates on up to 2^M sets where M is the cardinality of the hypothesis space: worst case scenario. That's not true if hypotheses are hierarchical, or if evidence is frequently about the same set, or there just isn't enough evidence to fuse to get to 2^M.

In the worst case scenario there are efficient approximation methods which can be used.


Injective doesn’t mean bijective, and that seems obvious. That is, presumably very many inputs will map to the output “Yes”.


Afaict surjectivity was already a given before this paper, their contribution is the injectivity part (and thus invertibility)


Doing Advent of Code 2025 in SETL this year.


I think more often people cast the widest net and then filter what comes back based on “is this better than what I have”.

I’m not sure that the process the author describes is all that common in practice even if it is eminently sensible.


:-) I would have hired you!


That no one doing serious statistics uses QGis is false as evidenced both by community and sponsors. Try searching “who uses QGis”.


See the Jeffrey posterior section here:

https://en.m.wikipedia.org/wiki/Binomial_proportion_confiden...

The blog post uses a non informative Jeffrey prior.


The Jeffrey posterior in the answer is closed form and Bayesian. The other answer is a profile likelihood.

Neither involve Monte Carlo sampling. Both are general and principled.


The answer looks a bit simplistic compared to the question (as I interpreted it at least). In order to estimate the risk of incorrect ordering you have to calculate P(p2 > p1) where p1 and p2 are drawn from different beta distributions. AFAIK there’s no closed form expression for that probability (so Monte Carlo is one possible approach).


The question doesn’t ask for that —- it explicitly asks us to control for over estimation of the fraction — although I rather like your interpretation as an extension.


Ok. I may have read a bit too much into this paragraph:

> Order the items above, smallest fraction first, whilst taking into account the uncertainty in the number of trials to bound the probability of over-estimating each fraction.

But why mention ordering if you’re not looking for statistical reasoning around the ordering in particular?


Bingo.


From the question:

“However, it is very important that the uncertainty in the number of trials is taken into account because over-estimating a fraction is a costly mistake.“

Seems fairly clear to me that you’re supposed to use a lower bound estimate to take into account variance on the fraction due to the number of trials in a way to bounds the chance of over estimation.

Further, there is no need for a heuristic when there a several statistical models for this exact problem with clear properties. Some are given in the answer.


Out of context, the expression "the uncertainty in the number of trials" would refer to missing knowledge in terms of how many trials actually ran.

In the context of the post this doesn't make sense, so the reader is left to hypothesize what the writer actually meant.


I agree it could be clearer but as a general rule, if you find an interpretation under which the question doesn’t make sense, try considering another interpretation.


I think "uncertainty due to the number of trials" would be clearer.


Uncertainty in the number of successes due to the number of trials, would be even better.


That would also be an incorrect phrasing. This entire thread is a good illustration of the difficulty of speaking precisely about probabilistic concepts.

(The number of successes has zero uncertainty. If you flip a coin 10 times and get 5 heads, there is no uncertainty on the number of heads. In general, for any statistical model the uncertainty is only with respect to an underlying model parameter - in this example, while your number of successes is perfectly known, it can be used to infer a probability of success, p, of 0.5, and there is uncertainty associated with that inferred probability.)


I guess I struggle to understand why under estimation is worse than over estimation, when the final result is a ranking. It seems like they're equally likely to produce an incorrect ranking!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: