Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Humans are much better at out of sample prediction than LLMs. And inherently benchmarks cannot be out of sample. So I believe that leads to the disconnect between LLMs getting better and better at in sample prediction (benchmarks) while not improving nearly as much at out of sample (actual work).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: