Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People have to stop treating LLMs and transformer models like they're the solution to everything. Never before had I seen so much jump on a hype by academics (I'm an academic too). Honestly it feels like mostly clout seeking and a way to improve citation numbers.


Transformers are a primitive we have not fully figured out yet. So it’s perfectly fine to see where this primitive fits.

For example, this week several papers on time series forecasts indicate they may have use there.

For example they seem to do better job on translation than previous approaches.

For example they seem to do a better job at transcription than previous approaches.

Probably will do better on OCR than previous approaches.

Probably has flaws that limits scenarios requiring high precision we may or may not overcome

Possibly will better on autonomous decision making (Does x include a privacy leak should be investigated?) than previous approaches (keyword scanning).

We are in a technological wave of discovery and experimentation, calls for restraint of curiosity and research betray fears


It goes way beyond that. Transformers are the first practical, scalable, general-purpose differentiable (i.e. trainable with gradient descent) algorithm. We haven't come close to seeing the limits of what they can do, because everything so far points to the fact that their only limit is our current hardware. And hardware is improving at a much faster and steadier rate than algorithms in computer science these days.


everything? they've solved reinforcement learning? they can handle continuous domains, like robot motion? that's funny, I thought they could only handle sequences of tokens.*

yes, they're exciting, and they are the most general architecture we've found so far, but there are important problems in AI (like anything continuous), that they're really not suited for.

I think there's better architectures out there for many tasks, and I'm a little dismayed that everyone seems to be cargo-culting the GPT architecture rather than taking the lessons for transformers and experimenting with more specialized algorithms.

*btw they don't need quantized tokens, there's no reason they can't just work on continuous vectors directly, and they don't have to be causal or limited to one sequence, but "transformer" seems to mean GPT in everyone's mind, and even though the original transformer was an encoder-decoder model we rarely seem to see those these days for some reason.


>they've solved reinforcement learning?

Transformers can do Reinforcement Learning yes.

https://arxiv.org/abs/2106.01345

https://arxiv.org/abs/2205.14953

>they can handle continuous domains, like robot motion?

Yes they can handle it just fine. Excellently in fact.

https://www.deepmind.com/blog/scaling-up-learning-across-man...

https://tidybot.cs.princeton.edu/

https://general-pattern-machines.github.io/

https://wayve.ai/thinking/lingo-natural-language-autonomous-...

I don't know if anyone is saying they're the best at or have "solved" everything but they can damn near do anything.


I'm not sure about the last assertion. How would you even compare them? It is easier to deploy newer algorithms (software) than hardware.


There's two ways you can solve currently intractable problems: Find better algorithms or improve the hardware. It's actually insanely hard to come up with new algorithms, that is why machine learning and AI were lagging behind most of computer science for decades.


> calls for restraint of curiosity and research betray fears

That kind of rhetoric is dangerously close to the cryptocurrency shills saying all critics are people who are annoyed because they “missed the boat” and didn’t get rich. It’s the kind of generic comment which can be used to discredit anything the interlocutor wants.


"There is more where this came from" reads differently based on how interesting/valuable "this" is. Cryptocurrency wasn't producing anything interesting beyond new flavors of old scams, and renewed appreciation for financial regulation. Transformer models and LLMs in particular seem to be blazing through any problem we throw at them, including previously unsolved problems, and every marginal improvement is instantly useful (both in terms of capabilities and in terms of $$$ produced). We hit the motherlode here - and it shows no sign of running out just yet.

That's the difference.

And even when we hit the limit of improvement on those models, when scaling up won't make a qualitative difference, we can expect many years of further breakthroughs in applications, as R&D focuses on less obvious applications and making more efficient use of the capabilities available.


> "There is more where this came from"

That’s not the part of the comment I had an issue with, which is why it’s not the part I quoted. What I commented on is the end, which implies negative intentions on the part of the original commenter.


Makes sense, there's an awful lot of overlap between the ex-crypto community and the AI community.


Research and curiosity a miles away from “get instantly rich” and “solution in search of a problem” we’ve seen in crypto.

Given the context of the original comment this was a response to, you’d have to take things out of context to construct this as a generic comment


> Research and curiosity a miles away from “get instantly rich” and “solution in search of a problem” we’ve seen in crypto.

And what the original comment boils down to is “let’s not make LLMs and transformers solutions in search of problems, let’s not try to fit them to solve everything”. It seems you might agree.

My issue with the response has nothing to do with specific technologies, but that it painted another view as having an agenda (fear). That is the generic defence. You take something you believe and then say those who disagree do so due to <negative connotation>.

By the way, this is not the point but there are plenty of “solution in search of a problem” and “get instantly rich” (including full-on scams) cases in the current wave of AI.


I'm struggling to find where these authors are treating LLMs and transformers like they're the solution to everything, or where they are just following hype around? These are long-time AI researchers at Berkeley who've developed a framework for dealing with limited context windows, drawing inspiration from operating systems.


"LLMs as Operating Systems"... —long-time AI researchers at Berkeley


In the paper, they're talking about managing LLM context windows the way operating systems manage memory and files. They're not saying LLMs should be used as operating systems.


they should work on their communication skills.


This is unhelpful. People should be expected to read past headlines. The paper's abstract takes 30 seconds to read.


I'm sorry it's unhelpful, but it is how I feel about the article and click-baity title. downwote all you want.


Always cringeworthy when the average bum coder on HN with zero attention span or intellectual curiosity criticizes papers way out of their intellectual depth without reading further than the title. They are describing a new concept using "operating system" almost as a metaphor. This is a way to describe novel concepts. The article abstract makes the point clear and only takes a few seconds to read.


The branding on this is a bit much, it’s not an operating system. However LLMs are the real deal, some papers claim they are achieving SOTA or significant breakthroughs in various domains. Surely they’re computationally intensive, but if you have read the most of the papers out of Berkeley, Microsoft, Meta, Google/DeepMind/Waymo… I think you’d have an change in opinion.


> However LLMs are the real deal

Weird thing is it was designed to model language. It’s surprising that it returns sound answers as often as it does. But that’s also kind of the problem, it’s “surprising”, i.e. we don’t really know what happened.

You wouldn’t fly on a jetliner that’s “surprising it flies without disintegrating midair”.


People were pretty surprised by the Wright flyer. Confidence is built by experience, not theoretical understanding.


Science being iterative, they definitely weren't the first to fly, and not even in a heavier than aircraft, what they did acheive was the first time the pilot had 3-axis control, and was the first powered heavier than air manned flight.


> Weird thing is it was designed to model language. It’s surprising that it returns sound answers as often as it does.

Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers?

> “surprising”, i.e. we don’t really know what happened.

This ie reads like a sort of popsci conclusion.

We know exactly what happened. We programmed it to perform these calculations. It’s actually rather straightforward elementary mathematics.

But, what happens is so many interdependent calculations grow the complexity of the problem until we are unable to hold it in it our minds, and to analyze its decisions computationally necessitates similar levels of computation for each decision being made as what was used to compute the weights.

As for its effectiveness, familiarity with the field of computational complexity points to high dimensional polynomial optimization problems being broadly universal solvers.


> Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers?

It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority. Early version, GPT1/2, all return mostly complete nonsense. It was only with GPT3 when the model got large enough that it started returning results that are convincing and might even make sense often enough.

Even more mind boggling is the fact that randomness is part of its algorithm, i.e. temperature, and that without it the output is kind of meh.


> It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority.

If you took the same amount of data for the GPT3+ but scrambled it's tokenization before training THEN I would agree with you that its current behaviour is surprising, but the model was fed data that has large swaths that are literal question and answer constructions. It's over fitting behavior is largely why it's parent company is facing so much legal backlash.

> Even more mind boggling is the fact that randomness is part of its algorithm

The randomness is for token choice rather than any training time tunable so fails to support the "i.e. we don’t really know what happened" sentiment. We do know, we told it to flip a coin, and it did.

> i.e. temperature, and that without it the output is kind of meh.

Both without it and with it. You can turn up the temperature and get bad results as well as you can turn it down and get bad results.

If adding a single additional dimension to the polynomial of the solution space turned a nondeterministic problem into a deterministic one, then yes, I would agree with you, that would be surprising.


> so fails to support the "i.e. we don’t really know what happened" sentiment

It's less that we don't know what's happening on a micro-level but more that it's surprising that it's producing anything coherent at all on a macro-level - especially with a (necessary) element of randomness in the process.

For most part we don't seem particularly knowledgeable about what happens on a macro-level. Hallucinations remain an unsolved problem. AI companies can't even make their "guardrails" bulletproof.


If you believe LLMs are fully explainable you should write a paper and submit to arxiv.


I think this is an uncharitable reading of this thread.

I’m arguing against the breathless use of “surprising”.

My gp explains what I think you overlooked in this dismissive response.

> to analyze its decisions computationally necessitates similar levels of computation for each decision being made as what was used to compute the weights.

Explainable but intractable is still far from surprising for me.


> It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority.

If you read through what Hinton or any of his famous students have said, it genuinely was and is surprising. Everything from AlexNet to the jump between GPT-2 to GPT-3 was surprising. We can't actually explain that jump in a formal way, just reasonable guesses. If something is unexplainable, it's unpredictable. Prediction without understanding is a vague guess and the results will come as a surprise.


>Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers?

Lol researchers were surprised by the mostly incoherent nonsense pre-transformer RNNs were spouting years go, nevermind the near perfect coherency of later GPT models. To argue otherwise is just plain revisionism.

http://karpathy.github.io/2015/05/21/rnn-effectiveness/


From your linked post:

> What made this result so shocking at the time was that the common wisdom was that RNNs were supposed to be difficult to train (with more experience I’ve in fact reached the opposite conclusion). Fast forward about a year: I’m training RNNs all the time and I’ve witnessed their power and robustness many times, and yet their magical outputs still find ways of amusing me.

This reads more like humanizing the language of the post then any legitimate surprise from the author.

The rest of the post then goes into great detail showing that “we DO really know what happened” to paraphrase the definition the op provides for their use of “surprise”.

> Conclusion We’ve learned about RNNs, how they work, why they have become a big deal, we’ve trained an RNN character-level language model on several fun datasets, and we’ve seen where RNNs are going.

I am pushing back on people conflating the innate complexity of a high dimensional polynomial with a misplaced reverence of incomprehensibility.

> In fact, it is known that RNNs are Turing-Complete in the sense that they can to simulate arbitrary programs (with proper weights).

Mathematically proven to be able to do something is about as far from surprise as one can get.


>This reads more like humanizing the language of the post then any legitimate surprise from the author.

Lol Sure

>I am pushing back on people conflating the innate complexity of a high dimensional polynomial with a misplaced reverence of incomprehensibility.

We don't know what the models learn and what they employ to aid in predictions. That is fact. Going on a grad descent rant is funny but ultimately meaninglessness. It doesn't tell you anything about the meaning of the computations.

There is no misplaced incomprehensibility because the internals and how they meaningfully shape predictions is incomprehensible.

>Mathematically proven to be able to do something is about as far from surprise as one can get.

Magic the gathering is turing complete. I'm sorry but "therotically turing complete" is about as meaningless as it gets. Transformers aren't even turing complete.


Unless you’re using an LLM to fly a plane, your analogy is a woefully bad comparison.


I guess my point is, if we don’t understand it, we don’t know its failure scenarios.


This is the case with all neural nets/black box AI models and a lot are used in various industries.


What things are not understood about transformers?


All the uses


There's an LLM for that.


> Weird thing is it was designed to model language.

Not exactly. They are designed to perform natural language processing (NLP) tasks. That includes understanding language and answering questions.


Depends what "operating system" means. I'd say things like democracy, marketing/journalism/propaganda, etc are operating systems of some sort, in that they perform orchestration of humans, modify reality, etc. Lack of memory is a big handicap for LLM's if they want to play in that league.


We have found a new hammer and everything very much looks like a nail.

It's perfectly natural and allows us to figure out what does and doesn't work, even if it means sometimes we have to deal with empty hype projects.


This wave of AI hype really brought out the curmudgeons, didn't it? Here we are, with this exciting architecture that is surprisingly applicable to a wide range of problem domains, yet we get "this is like crypto all over again!!1" and "oh everything looks like a nail now, huh?". I mean dang.


Judge it by its merits.

If the object of hype adds useful novelty, the interest could be justified. If, as it's often the case, it is not quite known - it's a question to figure out.

Granted, intuition is worth something, but it's still not a certainty, so somebody having a different opinion still could see something useful here.


I would agree, but on the other hand, Transformer (or attention really) based models seem to be the first time that computers are generating ad hoc human text on a variety of topics, so I do believe the hype is justified. I mean... people have spent entire careers in pursuit of this goal, and it's here... as long as what you want to talk about is less than 4096 / some K tokens.

Given how little progress (relatively) was made until transformers, it seems totally reasonable to pursue att ention models.


And as long as it is English. How well do they work for other languages with large corpuses?

I know they suck for Serbian, but I wonder what kind of corpus they need to become useful?


Fwiw I talk to gpt 3.5 in Spanish frequently and there’s no problem that doesn’t exist in the English version.


Interesting: I do wonder about slightly more complex languages which have declensions and verb "gender" (eg. in Serbian "pevala" means "(a female) sang", whereas "pevao" means that a male did. Or nouns and adjectives can be in 7 declensions: "plavom olovkom" means "with a blue pen", whereas "a blue pen" is just "plava olovka".

ChatGPT always mixes these up, hallucinates a bunch of words (inappropriate prefixes, declensions etc and is very happy to explain the meaning of these imaginary words), and I can imagine smaller, more complex languages like Serbian needing even larger corpuses than English, yet that's exactly the hard part: there is simply less content to go off of.


I agree they are probably being over-applied but I for one am fascinated and excited for the future and all the possibilities. It’s amazing how much effort is being applied across the board and failures to me just progress other research.


It deserves to be explored. A lot of research is just applying existing techniques to problems with a few tweaks and documenting the results.

I did not read the paper, but it could be that the authors find it is not effective.


Sure if the initial blockchain hype I remember people wanted to build a OS.


Some of those “OS”es did launch, but theyve been rebranded to SDKs


they are very clearly the most powerful technological innovation i've witnessed in my adult life. we don't even know what they're capable of yet, we're like one year into discovering practical usecases and they are improving at a constant dizzying rate. this is the time for exploration and experimentation.


I'm a software developer, I live in the concrete world.

To put it in layman's terms, LLMs are to software developers what power tools were for tradesmen.

Sure you can still use your old screw driver, and for some work, it's not worth getting your electric drill out; but it's a game changer.

Sometimes, the hype is justified. I believe at the OS layers, LLM support would make sense. It's full of old mental constructs and "I have to remember how to do this".


Precisely this. This doesn't solve anything and appears to be absurd techno-solutionism at this point.

Every single problem does not need to be solved using an LLM, which this paper tells us the amount of desperation in this hype cycle.

Those reading this paper and giving it credibility have fallen for this nonsense and are probably gullible enough to believe in this non use-case.


> Every single problem does not need to be solved using an LLM

How do you know if a problem is suitable for transformers/LLMs unless you try? We have this great generalized architecture, I would hope people throw everything at it and see what sticks, because what's the downside? Less focus on bespoke predictive models? Oh no.


Did you read the paper? What exactly do you think it does?


LLMs are a step function change in computers




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: