If you watch how agents attempt a task, fail, try to figure out what went wrong, try again, repeat a couple more times, then finally succeed -- you don't see the similarity?
LLMs don't do this. They can't think. If you just one for like five minutes it's obvious that just because the text on the screen says "Sorry, I made I mistake, there are actually 5 r's in strawberry", doesn't mean there's any thought behind it.
I mean, you can literally watch their thought process. They try to figure out reasons why something went wrong, and then identify solutions. Often in ways that require real deduction and creativity. And have quite a high success rate.
If that's not thinking, then I don't know what is.
You just haven't added the right tool together with the right system/developer prompt. Add a `add_memory` and `list_memory` (or automatically inject the right memories for the right prompts/LLM responses) and you have something that can learn.
You can also take it a step further and add automatic fine-tuning once you start gathering a ton of data, which will rewire the model somewhat.
I guess it depends on what you understand "learn" to mean.
But in my mind, if I tell the LLM to do something, and it did it wrong, then I ask it to fix it, and if in the future I ask the same thing and it avoids the mistake it did first, then I'd say it had learned to avoid that same pitfall, although I know very well it hasn't "learned" like a human would, I just added it to the right place, but for all intents and purposes, it "learned" how to avoid the same mistake.