He also said other things about LLMs that turned out to be either wrong or easil...

tonii141 · 2025-11-12T09:35:01 1762940101

a) Still true: vanilla LLMs can’t do math, they pattern-match unless you bolt on tools.

b) Still true: next-token prediction isn’t planning.

c) Still true: error accumulation is mitigated, not eliminated. Long-context quality still relies on retrieval, checks, and verifiers.

Yann’s claims were about LLMs as LLMs. With tooling, you can work around limits, but the core point stands.

killerstorm · 2025-11-13T10:43:18 1763030598

LeCun's argument was that a single erroneous token would derail further response.

This is, obviously, false: a reasoning model (or a non-reasoning one with a better prompt) can recognize error and choose a different path, the error will not be the part of an answer.

You're talking about a different problem: context rot. It's possible that an error would make performance worse. So what?

People can also get tired when they are solving a complex problem. People use various mitigations: e.g. it might help to start from a clean sheet. These mitigations might also apply to LLM: e.g. you can do MCTS (tree-of-thought) or just edit reasoning trace replacing the faulty part.

"LLMs are not absolutely perfect and require some algorithms on top thus we need a completely different approach" is a very weird way to make a conclusion.

killerstorm · 2025-11-12T10:59:41 1762945181

My man, math is pattern matching, not magic. So is logic. And computation.

Please learn the basics before you discuss what LLMs can and can't do.

ozgrakkurt · 2025-11-12T12:31:55 1762950715

I'm no expert on math but "math is pattern matching" really sounds wrong.

Maybe programming is mostly pattern matching but modern math is built on theory and proofs right?

noddybear · 2025-11-12T13:40:41 1762954841

Nah, its all pattern matching. This is how automated theorem provers like Isabelle are built, applying operations to lemmas/expressions to reach proofs.

staticman2 · 2025-11-12T14:38:56 1762958336

I'm sure if you pick a sufficiently broad definition of pattern matching your argument is true by definition!

Unfortunately that has nothing to do with the topic of discussions, which is the capabilities of LLMs, which may require a more narrow definition of pattern matching.

vbarrielle · 2025-11-12T15:44:17 1762962257

Automated theorem provers are also built around backtracking, which is absent in LLMs.

HarHarVeryFunny · 2025-11-12T20:09:54 1762978194

When an LLM does it, it's pattern matching.

RL training amounts to pattern matching.

How does an LLM decode Base64? Decode algorithm? No - predictive pattern matching.

An LLM isn't predicting what a person thinks - it's predicting what a person does.

NitpickLawyer · 2025-11-12T09:49:44 1762940984

a) no, gemini 2.5 was shown to "win" gold w/o tools. - https://arxiv.org/html/2507.15855v1

b) reductionism isn't worth our time. Planning works in the real world, today. (try any agentic tool like cc/codex/whatever). And if you're set on the purist view, there's mounting evidence from anthropic that there is planning in the core of an LLM.

c) so ... not true? Long context works today.

This is simply moving goalposts and nothing more. X can't do Y -> well, here they are doing Y -> well, not like that.

tonii141 · 2025-11-12T10:16:27 1762942587

a) That "no-tools" win depends on prompt orchestration which can still be categorized as tooling.

b) Next-token training doesn’t magically grant inner long-horizon planners..

c) Long context ≠ robust at any length. Degradation with scale remains.

Not moving goalposts, just keeping terms precise.

ACCount37 · 2025-11-12T11:51:32 1762948292

My man, you're literally moving all the goalposts as we speak.

It's not just "long context" - you demand "infinite context" and "any length" now. Even humans don't have that. "No tools" is no longer enough - what, do you demand "no prompts" now too? Having LLMs decompose tasks and prompt each other the way humans do is suddenly a no-no?

tonii141 · 2025-11-12T12:45:04 1762951504

I’m not demanding anything, I’m pointing out that performance tends to degrade as context scales, which follows from current LLM architectures as autoregressive models.

In that sense, Yann was right.

snapcaster · 2025-11-12T15:55:39 1762962939

Not sure if you're just someone who doesn't want to ever lose an argument or you're actually coping this hard

tonii141 · 2025-11-13T08:25:29 1763022329

I just see a lot of people who’ve put money in the LLM basket and get scared by any reasonable comment about why LLMs aren’t almighty AGIs and may never be. Or maybe they are just dumb, idk.

ACCount37 · 2025-11-13T10:06:09 1763028369

Even the bold take of "LLMs are literally AGI right now" is less of a detour from reality than "LLMs are NEVER going to hit AGI".

We've had LLMs for 5 years now, and billions were put into pushing them to the limits. We are yet to discover any fundamental limitations that would prevent them from going all the way to AGI. And every time someone pops up with "LLMs can never do X", it's followed up by an example of LLMs doing X.

Not that it stops the coping. There is no amount of evidence that can't be countered by increasing the copium intake.

ilaksh · 2025-11-12T09:01:35 1762938095

That's true but I also think despite being wrong about the capabilities of LLMs, LeCun has been right in that variations of LLMs are not an appropriate target for long term research that aims to significantly advance AI. Especially at the level of Meta.

I think transformers have been proven to be general purpose, but that doesn't mean that we can't use new fundamental approaches.

To me it's obvious that researchers are acting like sheep as they always do. He's trying to come up with a real innovation.

LeCun has seen how new paradigms have taken over. Variations of LLMs are not the type of new paradigm that serious researches should be aiming for.

I wonder if there can be a unification of spatial-temporal representations and language. I am guessing diffusion video generators already achieve this in some way. But I wonder if new techniques can improve the efficiency and capabilities.

I assume the Nested Learning stuff is pretty relevant.

Although I've never totally grokked transformers and LLMs, I always felt that MoE was the right direction and besides having a strong mapping or unified view of spatial and language info, there also should somehow be the capability of representing information in a non-sequential way. We really use sequences because we can only speak or hear one sound at a time. Information in general isn't particularly sequential, so I doubt that's an ideal representation.

So I guess I am kind of variations of transformers myself to be honest.

But besides being able to convert between sequential discrete representations and less discrete non-sequential representations (maybe you have tokens but every token has a scalar attached), there should be lots of tokenizations, maybe for each expert. Then you have experts that specialize in combining and translating between different scalar-token tokenizations.

Like automatically clustering problems or world model artifacts or something and automatically encoding DSLs for each sub problem.

I wish I really understood machine learning.

HarHarVeryFunny · 2025-11-12T20:04:00 1762977840

> So the longer you go in your generation, the higher the error rate. so at long contexts the answers become utter garbage.

Not totally wrong. They can self-correct, but it seems context rot will eventually set in.