> Are they? They are actually. It's a kind of revelation when you realise this s...

> Are they?

They are actually. It's a kind of revelation when you realise this statement is true. There are people in AI who have thought long about the problem of intelligence and had this realisation. For example: https://www.youtube.com/watch?v=boiW5qhrGH4 Then there is the interview with Hutter himself: https://www.youtube.com/watch?v=E1AxVXt2Gv4

You can even semi-formally derive this from Bayes' rule. Let's consider intelligence to be the ability to predict future data from past data. This requires building a model for the past data that can make predictions. So intelligence produces the most likely model M given data D:

    argmax_M P(M|D)

which from Bayes' rule is:

    argmax_M P(M|D) = P(D|M)*P(M)/P(D)

You convert probabilities to information by taking the negative log:

    argmin_M -log(P(M|D)) = -log(P(D|M)) + -log(P(M))

We lost -log(P(D)) because we consider D constant so it can't influence the minimum. You can read the above informally as: The most likely model (the most general model) that can make predictions from data D is that where the (encoding of the model with the least information) plus (the encoding of the data using the model) is minimal. This is essentially a statement about compression. I'm sure an AI person could do this better. This is also known as:

https://en.wikipedia.org/wiki/Minimum_description_length

or in philosophy as:

https://en.wikipedia.org/wiki/Occam's_razor