The tutorials, examples, demos, and blogs for every vector DB or RAG system that I know of focus mostly on toy datasets far, far under a million tokens. So most people can be forgiven for the view that long context would kill RAG - and it really should for these trivial use cases.
I don't know if vector DB and RAG vendors don't demo their software with legitimately large datasets because they don't want to compete with their customers, or because they're not confident enough in the results, or because they don't really use their software themselves.
To give an example of a RAG dataset I have played with, it is about 10k documents and 5M tokens. Adding prior versions, expanding the coverage, or augmenting from other sources I'm sure it could blow well past 10M tokens. Maybe extremely long context models will get extremely fast and cheap, but at least for the next couple years you probably won't be stuffing this all in the context. Even for a 1M context you still have to filter, retrieve, and rank the documents. And 10,000 documents is really not a lot compared to other corpora you could imagine (e-mails, media, law, science, code, etc). But it is certainly bigger than many of the use cases being pitched for RAG - like a few hundred personal notes or a corporate wiki with 500 poorly maintained pages on it.
Even for small use cases with less than a million tokens RAG is often the better approach because of the time and money huge context windows consume: A minute (!!) and a dollar (?) per query in the Gemini demo
I have no idea if massive context will kill RAGs, but an article written by someone who sells vector databases for a living is not the most conflict-of-interest free source.
I don’t understand why it is phrased as either or at all. The two techniques seem to come with very different trade offs which can be harnessed appropriately
Are there open models that support these extremely long context lengths? It seems that for a RAG like functionality, this is much easier than retraining the model, even if the quality isn't always perfect. For personal usage I'm very interested in this application.
Using a massive context window is akin to onboarding a new employee before every mundane task performed. While a trained employee will take new task easily with existing context.
The trade off is simply cost. The cost remains in the LLM scene with regards to speed of execution and token cost.
Fitting things into 4096 tokens is an advantage of RAG, but quality of answers is night & day when you have actual sources. It also commodifies web indexing, which is interesting
I’ve said it once and j in I’ll say it again, cost per a token is still part of the equation. Why throw money away by stuffing the context with unneeded tokens?
> It is a proven solution that effectively addresses fundamental LLM challenges such as hallucinations and lacking domain-specific knowledge.
mm.
> While RAG has proven beneficial in reducing LLM hallucinations, it does have limitations.
mhm.
> [good models] support 32k-token-long contexts, showcasing a substantial improvement in embedding capabilities. This enhancement in embedding unstructured data also elevates RAG’s understanding of long contexts.
mmmmhmmm.
So, let me get this straight.
A model with a long context makes RAG significantly more effective because you can put more context into the input.
...but a model with a really massive context window won't be significantly better?
??
> Vector databases, one of the cutting-edge AI technologies, are a core component in the RAG pipeline. Opting for a more mature and advanced vector database, such as Milvus,
> Conclusion: RAG Remains a Linchpin for the Sustained Success of AI Applications.
Aha! So, this is a sales pitch. Right.
No. You're wrong.
RAG, as it currently exists, is a dead end technology. It exists because models don't have a large enough context window.
If models get a significantly larger context window, it will become mostly irrelevant.
Obviously, there always going to be some technical / compute limitations on the window size, and at some level, you'll always need to filter down to the relevant context to put into the input, so yes, technically the approach will always be around in some form.
However, RAG in it's current form, where you have a tiny context window and you put little vector db located snippets in it, well.... lets just say, if I was a vendor for a vector database product, I'd also be worried and also be producing opinion pieces like this.
An open model with a massive context would solve these problems trivially for most people, and make most 'vector db' products unnecessary for most uses.
I don't know if vector DB and RAG vendors don't demo their software with legitimately large datasets because they don't want to compete with their customers, or because they're not confident enough in the results, or because they don't really use their software themselves.
To give an example of a RAG dataset I have played with, it is about 10k documents and 5M tokens. Adding prior versions, expanding the coverage, or augmenting from other sources I'm sure it could blow well past 10M tokens. Maybe extremely long context models will get extremely fast and cheap, but at least for the next couple years you probably won't be stuffing this all in the context. Even for a 1M context you still have to filter, retrieve, and rank the documents. And 10,000 documents is really not a lot compared to other corpora you could imagine (e-mails, media, law, science, code, etc). But it is certainly bigger than many of the use cases being pitched for RAG - like a few hundred personal notes or a corporate wiki with 500 poorly maintained pages on it.