You can even semi-formally derive this from Bayes' rule. Let's consider intelligence to be the ability to predict future data from past data. This requires building a model for the past data that can make predictions. So intelligence produces the most likely model M given data D:
argmax_M P(M|D)
which from Bayes' rule is:
argmax_M P(M|D) = P(D|M)*P(M)/P(D)
You convert probabilities to information by taking the negative log:
argmin_M -log(P(M|D)) = -log(P(D|M)) + -log(P(M))
We lost -log(P(D)) because we consider D constant so it can't influence the minimum. You can read the above informally as: The most likely model (the most general model) that can make predictions from data D is that where the (encoding of the model with the least information) plus (the encoding of the data using the model) is minimal. This is essentially a statement about compression. I'm sure an AI person could do this better. This is also known as:
They are actually. It's a kind of revelation when you realise this statement is true. There are people in AI who have thought long about the problem of intelligence and had this realisation. For example: https://www.youtube.com/watch?v=boiW5qhrGH4 Then there is the interview with Hutter himself: https://www.youtube.com/watch?v=E1AxVXt2Gv4
You can even semi-formally derive this from Bayes' rule. Let's consider intelligence to be the ability to predict future data from past data. This requires building a model for the past data that can make predictions. So intelligence produces the most likely model M given data D:
which from Bayes' rule is: You convert probabilities to information by taking the negative log: We lost -log(P(D)) because we consider D constant so it can't influence the minimum. You can read the above informally as: The most likely model (the most general model) that can make predictions from data D is that where the (encoding of the model with the least information) plus (the encoding of the data using the model) is minimal. This is essentially a statement about compression. I'm sure an AI person could do this better. This is also known as:https://en.wikipedia.org/wiki/Minimum_description_length
or in philosophy as:
https://en.wikipedia.org/wiki/Occam's_razor