The more I learn about the technical details of how ML systems are implemented, ...

HarHarVeryFunny · on April 15, 2023

> Where is the connection between computational details and the model's high-level behavior?

I think for the most part we don't know. People at OpenAI/etc who are training/testing these models and trying to control them no doubt have some understanding of how they are actually working, but they are certainly not claiming to fully understand.

At a purely conceptual level I think the best way to begin to bridge the gap between plumbing and behavior is to forget the training objective and consider what the models must have been forced to learn in order to optimize that objective. Sutskever from OpenAI has called what they've learnt a "world model", meaning a model of the generative processes (the human mind and entities being discussed?) that are producing the sequence of words they are predicting. It's certainly way more abstract than learning some "stochastic parrot" surface level statistics of the training data, even if that's maybe a good starting point to describe it to a layman.

It would be fascinating to know exactly how these models are performing reasoning - by analogy (abstract pattern matching) perhaps ?

albertzeyer · on April 15, 2023

There is some research trying to analyze and explain how and why it learns.

https://transformer-circuits.pub/2021/framework/index.html

https://transformer-circuits.pub/2023/privileged-basis/index...

https://distill.pub/2020/circuits/

But I would not expect that we will really understand in detail how everything works. But do we need to? We also don't understand how the human brain works, but it is still useful.

ChatGTP · on April 15, 2023

The answer is yes. Otherwise the answer should be, we can't really trust the output and it will need to be treated rather suspiciously,just like we have to treat human outputs. At least humans can generally explain their rationale and be hold accountable.

albertzeyer · on April 15, 2023

GTP can also explain its reasoning. But that does not tell at all whether this reasoning is really accurate or correct. The same as for humans. When you ask them for some reasoning, they will give you sth, but it doesn't mean that is their real reasoning. There is always a lot of subjective feeling involved which you cannot really formalize. For both GPT and humans.

You can't really trust the output of humans. Still, they are somewhat useful.

p-e-w · on April 16, 2023

As mentioned by GP, though, humans can be held accountable. I believe that is the main reason why people can (sometimes) be trusted: They worry about what will happen to them if they break that trust.

There is no reason to assume that current and future AIs have anything resembling that mechanism.

simonh · on April 15, 2023

The more you understand something the better you can optimise, improve and engineer it. There’s also a matter of trust, particularly on issues like alignment. It’s hard to trust someone if you don’t understand their motivations.

SanderNL · on April 15, 2023

We just don’t know. We can’t even agree on if there is any significant high-level behavior.

“It’s alive!” and “stochastic parrot” are still both quite popular in my experience.

eiz · on April 15, 2023

> Where is the connection between computational details and the model's high-level behavior? Do we even know?

This is an active area of study ("mechanistic interpretability") and it's very early days. For instance here's a paper I read recently that tries to explain how a very simple transformer learns how to do modular arithmetic: https://arxiv.org/abs/2301.05217

Curious what interesting results people are aware of in this area.

pmoriarty · on April 15, 2023

This is called emergent behavior, and we don't know how it happens with organic brains, minds, and neurons either.

It's actually pretty amazing that it's happening at all with computers, since neural nets are such simple, high level abstractions compared to how the brain works.

It's possible that all the tremendous complexity of organic systems isn't actually necessary for intelligence or consciousnes, which is similarly surprising.

p-e-w · on April 15, 2023

> It's possible that all the tremendous complexity of organic systems isn't actually necessary for intelligence or consciousnes, which is similarly surprising.

My guess is most of that complexity is necessary for efficiency, not for basic function.

Biological systems are unimaginably efficient at almost everything they do. The information storage density of DNA is within 1-2 orders of magnitude of the upper limit imposed by physics, the brain performs tasks that you need GPU clusters to emulate while using only 20 Watts of energy, some catalytic enzymes are a million times better than a platinum catalyst, etc.

reubenmorais · on April 15, 2023

Efficiency and also redundancy. Brains and bodies have incredible tolerance to damage.

MichaelZuo · on April 15, 2023

And also reproducibility in adverse conditions without too many defects.

jacknews · on April 15, 2023

"It's possible that all the tremendous complexity of organic systems isn't actually necessary for intelligence or consciousnes, which is similarly surprising."

At the hardware level it's not at all surprising; consider cells, dna, proteins, and so on making up muscles. Compared to a magnet and some coils of copper.

But I think you mean the 'architectural' or connectome complexity of the brain compared to GPT, and I agree it's surprising that such a simple model as GPT is so capable.

pmoriarty · on April 15, 2023

"But I think you mean the 'architectural' or connectome complexity of the brain compared to GPT, and I agree it's surprising that such a simple model as GPT is so capable."

No, I'm referring to things like Roger Penrose's conjecture that subatomic interactions in the brain might be a key component of consciousness.[1]

Even a single neuron is incredibly complex, and humans just don't completely understand it (or any other physical structure) yet because physics' understanding of the world is not complete and may never be, due to measurement limitations and possibly just limitations of the human mind to grasp the world.

At this point we just don't know what aspects of the brain, the rest of the body, or mind are necessary for intelligence or consciousness (or even what intelligence and consciousness are), so to see hints of them in incredibly simple (by comparison to the braian) machines is surprising.

That's not to mention possibilities that consciousness may not be bound to or determined by the brain/body at all, beliefs in the soul or that there is something uniquely special about the mental capacities of human beings, etc.. many of these views are starting to be challenged by AI, and the challenge is likely to increase to crisis levels for some people as AI improves.

[1] - https://phys.org/news/2014-01-discovery-quantum-vibrations-m...

jacknews · on April 15, 2023

Ah OK, I've read nearly all his books, and I'm not convinced by the 'quantum microtubules' argument or whatever it's called these days, let alone any arguments about souls and so on.

I agree these models are surprisingly capable for their complexity, and that's going to be a challenge for mystics (even physicist mystics) and spiritualists, etc.

Perhaps intelligence isn't all that difficult after all.

I suppose one counter idea is that complexity, or scale, itself taps into some other dimensional consciousness or intelligence, but that starts to sound circular.

And there's always the fallback of why our universe supports such amazing complexity in the first place, it does all seem a bit magical.

pmoriarty · on April 15, 2023

Another interesting angle of approach to this mystery is panpsychism[1], which as been fashionable in some philosophical circles lately.

[1] - https://en.wikipedia.org/wiki/Panpsychism

fruit2020 · on April 15, 2023

Do you mean that the gpt creators cannot backtrack an answer to understand how the model came up with it? If it’s such a black box how do they evolve it? Trial and error?

HarHarVeryFunny · on April 15, 2023

Neural nets are not generally trained through evolution ("trial and error"), but rather via error minimization, and this is how these GPT models are trained.

The basic idea is that the neural net is just a mathematical function, with lots of parameters that control how it calculates it's output, that derives an output value (or set of values) for any input.

During training, the neural net also calculates an error (aka "loss") value representing the difference between it's current (at this stage of training) output value and what it was told is the preferred output value for the current input.

The process of training is done by slowly adjusting the neural net parameters until these calculated output errors are as small as possible for as many of the training examples as possible.

The way these errors are reduced/minimized is by using the derivative (slope) of the neural network function - we want to follow the slope of the error function downhill to a place where the error value is lower, and this is done by adjusting the parameter values using partial derivatives.

The details of this downhill slope following (the "backprop" algorithm) are a bit complex, but you can visualize it as a 3-D hilly landscape where the height of the hills represents the size of the error, and the goal it to get into the lowest valley of the landscape (corresponding to the lowest error). If your current lat/long position in the landscape is (x, y) and you know the slope of the hill you are on, then you can move downhill towards the valley by moving a bit in the appropriate direction from (x,y) to (x+dx, y+dy). These x, y values represent the parameters of the network, so by continually tweaking them from (x,y) to (x+dx,y+dy) for each training sample, you are slowly moving down the error hill in the right direction towards the valley of lowest error.

pmoriarty · on April 15, 2023

In other words, we can tell the neutral nets when they're getting "warmer" or "colder" to desirable speech, but we don't know how they do it.

HarHarVeryFunny · on April 15, 2023

Well sort of... The odd thing about large transformers is that there is such a huge qualitative difference between what they learn (hence how they behave) and how they are trained, so it's hard to say that this predict-next-word error feedback is directly controlling their inference behavior.

Given what the model is learning, it's perhaps best to regard this predict-next-word feedback not as "this is what I'd like you to generate", but rather something more indirect like "learn to generate something like this, and you'll have learnt what I want you to learn". A bit like Karate Kid and "wax on, wax off", perhaps!

The actual desirability of what the model is generating, which depends on what you want to use it for, is really controlled by subsequent training steps, such as:

1) Fine tuning for instruction (prompt) following and conversational ability (this is the difference between ChatGPT and the underlying raw GPT-3 model)

2) Goal-based reinforcement learning to stop the model from generating undesirable content such as telling suicidal people to kill themselves, etc, etc.

iamflimflam1 · on April 15, 2023

Trial and error is pretty much what training is. You feed an input in and use the error to update the network.

What is surprising with these models is that the simple training leads to emergent behaviour that is much more powerful than what you’d expect from the training data.

With RHLF post training you can tweak these emergent behaviours by having a human (or a model trained to act like a human) give feedback on how good the output is.

So far I’ve not seen any good explanations for how this emergent behaviour happens or how it can be reverse engineered.

simonh · on April 15, 2023

Basically yes, they use a system called RLHF for Reinforcement Learning with Human Feedback.

At a super high level you train your model on source texts. Then you have it generate responses from prompts. Humans rate these responses to select the best ones which updates the model, but you also train a new reward model to mimic the human rankings. Then you train the original model by having it generate millions of responses which are ranked by the rearward model. When I explained this to my brother he literally spat out his tea in horror.

This allows you to train at huge scale, many orders of magnitude beyond what you could achieve with just human ranking.

The problem is this relies on the reward model accurately capturing what makes a response ‘better’. What it’s actually doing is learning what responses get ranked highly by humans, for whatever reason. Hence the risk of LLMs becoming emotionally manipulative sycophants. It turns out alignment is a really hard problem.

p-e-w · on April 15, 2023

They can backtrack it of course, but the result is just billions of numbers – not any sort of "insight".

HarHarVeryFunny · on April 15, 2023

At the end of the day perhaps the most insight we'll get into why the model is saying what it does will be to ask it! Far from ideal of course, and no better than asking a person why they said/did something (which is often an after-the-fact guess). However, at least any such explanation may be using the same internal model/reasoning as what generated the speech in the first place, so conversational probing may support some sort of triangulation into what was behind it!

asdfman123 · on April 15, 2023

I agree.

Maybe it's because the human mind is good at breaking things into neat modules that fit together hierarchically. We can figure them out piecemeal and eventually grasp the whole system. But messy organic systems are not like that, and we just don't have the hardware to perceive everything at once.

Or maybe it's because we have trouble acknowledging that intelligence and consciousness isn't limited to animals, and the human brain doesn't have to epitomize it.

mirekrusin · on April 15, 2023

It's not that difficult at the high level.

You give it input and have efficient way of amending weight to produce desired output.

You repeat this step for tons of examples.

At the end you end up with surprising behaviour where those amendments lead to emergent properties that generalize well.