>I don't see why these companies can't just stop training at some point.
Because training isn't just about making brand new models with better capabilities, it's also about updating old models to stay current with new information. Even the most sophisticated present-day model with a knowledge cutoff date of 2025 would be severely crippled by 2027 and utterly useless by 2030.
Unless there is some breakthrough that lets existing models cheaply incrementally update their weights to add new information, I don't see any way around this.
There is no evidence that RAG delivers equivalent performance to retraining on new data. Merely having information in the context window is very different from having it baked into the model weights. Relying solely on RAG to keep model results current would also degrade with time, as more and more information would have to be incorporated into the context window the longer it's been since the knowledge cutoff date.
I honestly do not think that we should be training models to regurgitate training data anyway.
Humans do this to a minimum degree, but the things that we can recount from memory are simpler than the contents of an entire paper, as an example.
There's a reason we invented writing stuff down. And I do wonder if future models should be trying to optimise for rag with their training; train for reasoning and stringing coherent sentences together, sure, but with a focus on using that to connect hard data found in the context.
And who says models won't have massive or unbounded contexts in the future? Or that predicting a single token (or even a sub-sequence of tokens) still remains a one shot/synchronous activity?
Because training isn't just about making brand new models with better capabilities, it's also about updating old models to stay current with new information. Even the most sophisticated present-day model with a knowledge cutoff date of 2025 would be severely crippled by 2027 and utterly useless by 2030.
Unless there is some breakthrough that lets existing models cheaply incrementally update their weights to add new information, I don't see any way around this.