Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Alphaproof paper (IMO 2024 Silver) is finally published in Nature [pdf] (nature.com)
2 points by zuzatm 26 days ago | hide | past | favorite | 1 comment


One notable difference from what one would expect from a LLM-RL paper is the use of test-time RL. I guess when you have a very strong verification, you can specialize your network to solve only your problem. Curious if this can be also be applied in natural language reasoning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: