Gemini and Claudes are killing RAG?

analyte123 · on March 8, 2024

The tutorials, examples, demos, and blogs for every vector DB or RAG system that I know of focus mostly on toy datasets far, far under a million tokens. So most people can be forgiven for the view that long context would kill RAG - and it really should for these trivial use cases.

I don't know if vector DB and RAG vendors don't demo their software with legitimately large datasets because they don't want to compete with their customers, or because they're not confident enough in the results, or because they don't really use their software themselves.

To give an example of a RAG dataset I have played with, it is about 10k documents and 5M tokens. Adding prior versions, expanding the coverage, or augmenting from other sources I'm sure it could blow well past 10M tokens. Maybe extremely long context models will get extremely fast and cheap, but at least for the next couple years you probably won't be stuffing this all in the context. Even for a 1M context you still have to filter, retrieve, and rank the documents. And 10,000 documents is really not a lot compared to other corpora you could imagine (e-mails, media, law, science, code, etc). But it is certainly bigger than many of the use cases being pitched for RAG - like a few hundred personal notes or a corporate wiki with 500 poorly maintained pages on it.

singularity2001 · on March 8, 2024

Even for small use cases with less than a million tokens RAG is often the better approach because of the time and money huge context windows consume: A minute (!!) and a dollar (?) per query in the Gemini demo

cthalupa · on March 8, 2024

I have no idea if massive context will kill RAGs, but an article written by someone who sells vector databases for a living is not the most conflict-of-interest free source.

pietz · on March 8, 2024

This.

Havoc · on March 8, 2024

I don’t understand why it is phrased as either or at all. The two techniques seem to come with very different trade offs which can be harnessed appropriately

xrd · on March 8, 2024

Are there open models that support these extremely long context lengths? It seems that for a RAG like functionality, this is much easier than retraining the model, even if the quality isn't always perfect. For personal usage I'm very interested in this application.

sp332 · on March 8, 2024

Abacus AI fine-tuned some llama models to 32k context lengths.

xrd · on March 8, 2024

Later on down it says that Mixtral has a context length of 32k which isn't shabby. That's interesting and probably would work for my needs.

gtr32x · on March 8, 2024

Using a massive context window is akin to onboarding a new employee before every mundane task performed. While a trained employee will take new task easily with existing context.

The trade off is simply cost. The cost remains in the LLM scene with regards to speed of execution and token cost.

refulgentis · on March 8, 2024

Fitting things into 4096 tokens is an advantage of RAG, but quality of answers is night & day when you have actual sources. It also commodifies web indexing, which is interesting

_boffin_ · on March 8, 2024

I’ve said it once and j in I’ll say it again, cost per a token is still part of the equation. Why throw money away by stuffing the context with unneeded tokens?

Spivak · on March 8, 2024

And even when it's your GPUs you probably still don't want to tie them up sending a textbook with every request.

joshellington · on March 8, 2024

Clickbait marketing blog post. Doesn’t belong on FP IMO.

superchink · on March 8, 2024

No. (article supports this too)

shishy · on March 8, 2024

The answer is always no if the article title is phrased as a question ;)

rdtsc · on March 8, 2024

https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...

wokwokwok · on March 8, 2024

> It is a proven solution that effectively addresses fundamental LLM challenges such as hallucinations and lacking domain-specific knowledge.

mm.

> While RAG has proven beneficial in reducing LLM hallucinations, it does have limitations.

mhm.

> [good models] support 32k-token-long contexts, showcasing a substantial improvement in embedding capabilities. This enhancement in embedding unstructured data also elevates RAG’s understanding of long contexts.

mmmmhmmm.

So, let me get this straight.

A model with a long context makes RAG significantly more effective because you can put more context into the input.

...but a model with a really massive context window won't be significantly better?

??

> Vector databases, one of the cutting-edge AI technologies, are a core component in the RAG pipeline. Opting for a more mature and advanced vector database, such as Milvus,

> Conclusion: RAG Remains a Linchpin for the Sustained Success of AI Applications.

Aha! So, this is a sales pitch. Right.

No. You're wrong.

RAG, as it currently exists, is a dead end technology. It exists because models don't have a large enough context window.

If models get a significantly larger context window, it will become mostly irrelevant.

Obviously, there always going to be some technical / compute limitations on the window size, and at some level, you'll always need to filter down to the relevant context to put into the input, so yes, technically the approach will always be around in some form.

However, RAG in it's current form, where you have a tiny context window and you put little vector db located snippets in it, well.... lets just say, if I was a vendor for a vector database product, I'd also be worried and also be producing opinion pieces like this.

An open model with a massive context would solve these problems trivially for most people, and make most 'vector db' products unnecessary for most uses.

refulgentis · on March 8, 2024

This is a really long comment that I went through and I feel very cheated the TL;DR: is "no one _EVER_ needs sources" with a side of smug

wokwokwok · on March 8, 2024

I don't even know what that means.

The tldr is:

(almost) No one needs vector databases when they have massive context models.

If you're a vector database vendor, you probably disagree; but you're also wrong, and biased.

You also look bad when you say things like 'RAG solves hallucinations' which is objectively false, sales pitch rubbish.