Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The author gave two arguments, a weak one and a stronger one. You quoted the weaker one. The OpenAI paper contains the stronger one, basically explaining that models will guess at the next token rather than saying “idk” because its guess could be correct.

The strongest argument in my mind for why statistical models cannot avoid hallucinations is the fact that reality is inherently long-tail. There simply isn’t enough data or FLOPs to consume that data. If we focus on the limited domain of chess, LLMs cannot avoid hallucinating moves that do not exist, let alone give you the best move. And scaling up training data to all positions is simply computationally impossible.

And even if it were possible (but still expensive) it wouldn’t be practical at all. Your phone can run a better chess algorithm than the best LLM.

All of this is to say, going back to your Fermat’s last theorem point, that we may eventually figure out a faster and cheaper way, and decide we don’t care about tall stacks of transformers anymore.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: