I've had similar experience both in coding and in non-coding research questions....

Romario77 · 2025-10-09T14:44:42 1760021082

LLMs are not good at "cycles" - when you have to go over a list and do the same action on each item.

It's like it has ADHD and forgets or gets distracted in the middle.

And the reason for that is that LLMs don't have memory and process the tokens, so as they keep going over the list the context becomes bigger with more irrelevant information and they can lose the reason they are doing what they are doing.

fwip · 2025-10-09T15:06:10 1760022370

It would be nice if the tools we usually use for LLMs had a bit more programmability. In this example, It we could imagine being able to chunk up work by processing a few items, then reverting to a previous saved LLM checkpoint of state, and repeating until the list is complete.

I imagine that the cost of saving & loading the current state must be prohibitively high for this to be a normal pattern, though.

radarsat1 · 2025-10-09T15:54:40 1760025280

Agreed. You basically want an LLM to have a tool that writes its own agent to accomplish a repetitive task. I think this is doable.

steveklabnik · 2025-10-09T17:11:44 1760029904

You can already sort of do this by asking it to write a script to do the refactor. Claude sometimes suggests this on its own to me even.

But obviously sometimes larger refactors aren't easy to implement in bash.

fwip · 2025-10-09T21:15:47 1760044547

Right - and ideally, after writing the script to do the task, it could discard all the tokens involved in writing the script.

HarHarVeryFunny · 2025-10-09T20:19:23 1760041163

Right.

In a recent YouTube interview Karpathy claimed that LLMs have a lot more "working memory" than a human:

https://www.youtube.com/watch?v=hM_h0UA7upI&t=1306s

What I assume he's talking about is internal activations such as stored in KV cache that have same lifetime as tokens in the input, but this really isn't the same as "working memory" since these are tied to the input and don't change.

What it seems an LLM would need to do better at these sort of iterative/sequencing tasks would be a real working memory that had more arbitrary task-duration lifetime and could be updated (vs fixed KV cache), and would allow it to track progress or more generally maintain context (english usage - not LLM) over the course of a task.

I'm a bit surprised that this type of working memory hasn't been added to the transformer architecture. It seems it could be as simple as a fixed (non shifting) region of the context that the LLM could learn to read/write during training to assist on these types of task.

An alternative to having embeddings as working memory is to use an external file of text (cf a TODO list, or working notes) for this purpose which is apparently what Claude Code uses to maintain focus over long periods of time, and I recently saw mentioned that the Claude model itself has been trained to use read/write to this sort of text memory file.

dmoy · 2025-10-09T15:04:45 1760022285

Which is annoying because that is precisely the kind of boring rote programming tasks I want an LLM to do for me, to free up my time for more interesting problems

polynomial · 2025-10-09T15:03:44 1760022224

So much for Difference and Repetition.

steveklabnik · 2025-10-09T17:10:33 1760029833

Surprised and a bit delighted to see a Deleuze reference on HN...