Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've had similar experience both in coding and in non-coding research questions. An LLM will do the first N right and fake its work on the rest.

It even happens when asking an LLM to reformat a document, or asking it to do extra research to validate information.

For example, before a recent trip to another city, I asked Gemini to prepare a list of brewery taprooms with certain information, and I discovered it had included locations that had been closed for years or had just been pop-ups. I asked it to add a link to the current hours for each taproom and remove locations that it couldn't verify were currently open, and it did this for about the first half of the list. For the last half, it made irrelevant changes to the entries and didn't remove any of the closed locations. Of course it enthusiastically reported that it had checked every location on the list.



LLMs are not good at "cycles" - when you have to go over a list and do the same action on each item.

It's like it has ADHD and forgets or gets distracted in the middle.

And the reason for that is that LLMs don't have memory and process the tokens, so as they keep going over the list the context becomes bigger with more irrelevant information and they can lose the reason they are doing what they are doing.


It would be nice if the tools we usually use for LLMs had a bit more programmability. In this example, It we could imagine being able to chunk up work by processing a few items, then reverting to a previous saved LLM checkpoint of state, and repeating until the list is complete.

I imagine that the cost of saving & loading the current state must be prohibitively high for this to be a normal pattern, though.


Agreed. You basically want an LLM to have a tool that writes its own agent to accomplish a repetitive task. I think this is doable.


You can already sort of do this by asking it to write a script to do the refactor. Claude sometimes suggests this on its own to me even.

But obviously sometimes larger refactors aren't easy to implement in bash.


Right - and ideally, after writing the script to do the task, it could discard all the tokens involved in writing the script.


Right.

In a recent YouTube interview Karpathy claimed that LLMs have a lot more "working memory" than a human:

https://www.youtube.com/watch?v=hM_h0UA7upI&t=1306s

What I assume he's talking about is internal activations such as stored in KV cache that have same lifetime as tokens in the input, but this really isn't the same as "working memory" since these are tied to the input and don't change.

What it seems an LLM would need to do better at these sort of iterative/sequencing tasks would be a real working memory that had more arbitrary task-duration lifetime and could be updated (vs fixed KV cache), and would allow it to track progress or more generally maintain context (english usage - not LLM) over the course of a task.

I'm a bit surprised that this type of working memory hasn't been added to the transformer architecture. It seems it could be as simple as a fixed (non shifting) region of the context that the LLM could learn to read/write during training to assist on these types of task.

An alternative to having embeddings as working memory is to use an external file of text (cf a TODO list, or working notes) for this purpose which is apparently what Claude Code uses to maintain focus over long periods of time, and I recently saw mentioned that the Claude model itself has been trained to use read/write to this sort of text memory file.


Which is annoying because that is precisely the kind of boring rote programming tasks I want an LLM to do for me, to free up my time for more interesting problems


So much for Difference and Repetition.


Surprised and a bit delighted to see a Deleuze reference on HN...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: