Alphaproof paper (IMO 2024 Silver) is finally published in Nature [pdf]

zuzatm · 2025-11-26T10:59:03 1764154743

One notable difference from what one would expect from a LLM-RL paper is the use of test-time RL. I guess when you have a very strong verification, you can specialize your network to solve only your problem. Curious if this can be also be applied in natural language reasoning.