Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> However LLMs are the real deal

Weird thing is it was designed to model language. It’s surprising that it returns sound answers as often as it does. But that’s also kind of the problem, it’s “surprising”, i.e. we don’t really know what happened.

You wouldn’t fly on a jetliner that’s “surprising it flies without disintegrating midair”.



People were pretty surprised by the Wright flyer. Confidence is built by experience, not theoretical understanding.


Science being iterative, they definitely weren't the first to fly, and not even in a heavier than aircraft, what they did acheive was the first time the pilot had 3-axis control, and was the first powered heavier than air manned flight.


> Weird thing is it was designed to model language. It’s surprising that it returns sound answers as often as it does.

Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers?

> “surprising”, i.e. we don’t really know what happened.

This ie reads like a sort of popsci conclusion.

We know exactly what happened. We programmed it to perform these calculations. It’s actually rather straightforward elementary mathematics.

But, what happens is so many interdependent calculations grow the complexity of the problem until we are unable to hold it in it our minds, and to analyze its decisions computationally necessitates similar levels of computation for each decision being made as what was used to compute the weights.

As for its effectiveness, familiarity with the field of computational complexity points to high dimensional polynomial optimization problems being broadly universal solvers.


> Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers?

It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority. Early version, GPT1/2, all return mostly complete nonsense. It was only with GPT3 when the model got large enough that it started returning results that are convincing and might even make sense often enough.

Even more mind boggling is the fact that randomness is part of its algorithm, i.e. temperature, and that without it the output is kind of meh.


> It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority.

If you took the same amount of data for the GPT3+ but scrambled it's tokenization before training THEN I would agree with you that its current behaviour is surprising, but the model was fed data that has large swaths that are literal question and answer constructions. It's over fitting behavior is largely why it's parent company is facing so much legal backlash.

> Even more mind boggling is the fact that randomness is part of its algorithm

The randomness is for token choice rather than any training time tunable so fails to support the "i.e. we don’t really know what happened" sentiment. We do know, we told it to flip a coin, and it did.

> i.e. temperature, and that without it the output is kind of meh.

Both without it and with it. You can turn up the temperature and get bad results as well as you can turn it down and get bad results.

If adding a single additional dimension to the polynomial of the solution space turned a nondeterministic problem into a deterministic one, then yes, I would agree with you, that would be surprising.


> so fails to support the "i.e. we don’t really know what happened" sentiment

It's less that we don't know what's happening on a micro-level but more that it's surprising that it's producing anything coherent at all on a macro-level - especially with a (necessary) element of randomness in the process.

For most part we don't seem particularly knowledgeable about what happens on a macro-level. Hallucinations remain an unsolved problem. AI companies can't even make their "guardrails" bulletproof.


If you believe LLMs are fully explainable you should write a paper and submit to arxiv.


I think this is an uncharitable reading of this thread.

I’m arguing against the breathless use of “surprising”.

My gp explains what I think you overlooked in this dismissive response.

> to analyze its decisions computationally necessitates similar levels of computation for each decision being made as what was used to compute the weights.

Explainable but intractable is still far from surprising for me.


> It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority.

If you read through what Hinton or any of his famous students have said, it genuinely was and is surprising. Everything from AlexNet to the jump between GPT-2 to GPT-3 was surprising. We can't actually explain that jump in a formal way, just reasonable guesses. If something is unexplainable, it's unpredictable. Prediction without understanding is a vague guess and the results will come as a surprise.


>Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers?

Lol researchers were surprised by the mostly incoherent nonsense pre-transformer RNNs were spouting years go, nevermind the near perfect coherency of later GPT models. To argue otherwise is just plain revisionism.

http://karpathy.github.io/2015/05/21/rnn-effectiveness/


From your linked post:

> What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I’ve in fact reached the opposite conclusion). Fast forward about a year: I’m training RNNs all the time and I’ve witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me.

This reads more like humanizing the language of the post then any legitimate surprise from the author.

The rest of the post then goes into great detail showing that “we DO really know what happened” to paraphrase the definition the op provides for their use of “surprise”.

> Conclusion We’ve learned about RNNs, how they work, why they have become a big deal, we’ve trained an RNN character-level language model on several fun datasets, and we’ve seen where RNNs are going.

I am pushing back on people conflating the innate complexity of a high dimensional polynomial with a misplaced reverence of incomprehensibility.

> In fact, it is known that RNNs are Turing-Complete in the sense that they can to simulate arbitrary programs (with proper weights).

Mathematically proven to be able to do something is about as far from surprise as one can get.


>This reads more like humanizing the language of the post then any legitimate surprise from the author.

Lol Sure

>I am pushing back on people conflating the innate complexity of a high dimensional polynomial with a misplaced reverence of incomprehensibility.

We don't know what the models learn and what they employ to aid in predictions. That is fact. Going on a grad descent rant is funny but ultimately meaninglessness. It doesn't tell you anything about the meaning of the computations.

There is no misplaced incomprehensibility because the internals and how they meaningfully shape predictions is incomprehensible.

>Mathematically proven to be able to do something is about as far from surprise as one can get.

Magic the gathering is turing complete. I'm sorry but "therotically turing complete" is about as meaningless as it gets. Transformers aren't even turing complete.


Unless you’re using an LLM to fly a plane, your analogy is a woefully bad comparison.


I guess my point is, if we don’t understand it, we don’t know its failure scenarios.


This is the case with all neural nets/black box AI models and a lot are used in various industries.


What things are not understood about transformers?


All the uses


There's an LLM for that.


> Weird thing is it was designed to model language.

Not exactly. They are designed to perform natural language processing (NLP) tasks. That includes understanding language and answering questions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: