Hacker Newsnew | past | comments | ask | show | jobs | submit | d3m0t3p's commentslogin

Yea but the goal it not to bloat the context space. Here you "waste" context by providing non usefull information. What they did instead is put an index of the documentation into the context, then the LLM can fetch the documentation. This is the same idea that skills but it apparently works better without the agentic part of the skills. Furthermore instead of having a nice index pointing to the doc, They compressed it.

The minification is a great idea. Will try this.

Their approach is still agentic in the sense that the LLM must make a tool cool to load the particular doc in. The most efficient approach would be to know ahead of time which parts of the doc will be needed, and then give the LLM a compressed version of those docs specifically. That doesn't require an agentic tool call.

Of course, it's a tradeoff.


What does it mean to waste context?

Context quite literally degrades performance of attention with size in non-needle-in-haystack lookups in almost every model to varying degrees. Thus to answer the question, the “waste” is making the model dumber unnecessarily in an attempt to make it smarter.

The context window is finite. You can easily fill it with documentation and have no room left for the code and question you want to work on. It also means more tokens sent with every request, increasing cost if you're paying by the token.

Think of context switching when you yourself are programming. You can only hold some finite amount of concepts in your head at one time. If you have distractions, or try to focus on too many things at once, your ability to reason about your immediate problem degrades. Think also of legacy search engines: often, a more limited and focused search query vs a query that has too many terms, more precisely maps to your intended goal.

LLM's have always been at any time limited in the amount of tokens it can process at one time. This is increasing, but one problem is chat threads continually increase in size as you send messages back and forth because within any session or thread you are sending the full conversation to the LLM every message (aside from particular optimizations that compact or prune this). This also increases costs which are charged per token. Efficiency of cost and performance/precision/accuracy dictates using the context window judiciously.


Same, Firefox iOS


The model is fined tuned for chat behavior. So the style might be due to - Fine tuning - More Stylised text in the corpus, english evolved a lot in the last century.


Diverged as well as standardized. I did some research into "out of pocket" and how it differs in meaning in UK-English (paying from one's own funds) and American-English (uncontactable) and I recall 1908 being the current thought as to when the divergence happened: 1908 short story by O. Henry titled "Buried Treasure."


Is that really the only thing you managed to remember ?


Because the ML ecosystem is more mature on the NVidia side. Software-wise the cuda platform is more advanced. It will be hard for AMD to catch up. It is good to see competition tho.


But the article shows that the Nvidia ecosystem isn't that mature either on the DGX Spark with ARM64. I wonder if Nvidia is still ahead for such use cases, all things considered.


On the DGX Spark, yes. On ARM64, Nvidia has been shipping drivers for years now. The rest of the Linux ecosystem is going to be the problem, most distros and projects don't have anywhere near the incentive Nvidia does to treat ARM like a first-class citizen.


In my own studies, software engineering was mostly about structurig code, coding pattern such as visitor, singleton etc. I.E how to create a maintainable codebase


My software engineering course was about the software development life cycle, different business methodologies like agile and waterfall, and working in a group.

It was very helpful. I would have appreciated “how to create a maintainable codebase” as well though. “Singleton” was not a part of my vocabulary until 3 years into my career :/


> “Singleton” was not a part of my vocabulary until 3 years into my career :/

If you are a more old-school style programmer, you simply use the older term "global variable". :-)


Looking back, I wish it never had been necessary to memorize all those design patterns just to get work done! All OOP has been is a huge distraction and mostly bs. This is me looking back across 30 years of work, so don't just downvote because you love OOP--try thinking about what I'm really saying here. OOP was, to me, an enormous bend in the river that eventually got pinched off and has become a horseshoe lake, destined to dry up and just become a scar on the software engineering landscape. It feels like it was all a big waste of time and someone's money making schemes, tbh.


Would you have some literature about that ?


There's a ton but it's pretty scattered. Yurii Nesterov's a big name, for example.


This sounds a lot like what the Muon / Shampoo optimizer do.


Interesting to see that they enforce retroactive opt out for data collection. I wonder how they do that, what if the model is already trained with your data and you opt out.


You can batch only if you have distinct chat in parallel,


> > if I want to run 20 concurrent processes, assuming I need 1k tokens/second throughput (on each)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: