Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Many people are confused about the usefulness of 1M tokens because LLMs often start to get confused after about 100k. But this is big for Claude 4 because it uses automatic RAG when the context becomes large. With optimized retrieval thanks to RAG, we'll be able to make good use of those 1M tokens.


How does this work under the hood? Does it build an in-memory vector database of the input sources and runs queries on top of that data to supplement the context window?


No idea how it's implemented because it's proprietary. Details here: https://support.anthropic.com/en/articles/11473015-retrieval...


RAG commonly implies some sort of vector database to be built and which will then be used for response augmentation. If it operates over the repo, I believe it will index your codebase using those vector embeddings.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: