There isn't more information than you started with though. That's an illusion, k...

thomastjeffery · on May 9, 2023

> There isn't more information than you started with though.

There's the same amount of data, yes. The extra info is the structure of the model itself.

> The ability to train models and have them reproduce terabytes of human knowledge simply shows that this knowledge contains repetition in its underlying semantic structure that can be a target of compression.

Yes, and the interesting part is the "what" of that repitition. What patterns exist in written text?

> Is there less information in the universe than we think there is?

There is both less and more, depending on how you look at it.

Most of the information in the universe is Cosmic Microwave Background Radiation. It's literally everywhere. We can't predict exactly what CMBR data will exist at any point in spacetime. It's raw entropy: noise. The popular theory is that it comes from the expansion of the universe itself; originating from the big bang. From the most literal perspective, the entire universe is constantly creating more information.

Even though the specifics of the data are unpredictable, CMBR has an almost completely uniform frequency/spectrum and amplitude/temperature. From an inference perspective, the noisy entropy of the entire universe follows a coherent and homogenous pattern. The better we can model that pattern, the more entropy we can factor out; like data compression.

This same dynamic applies to human generated entropy, particularly written language.

If we approach language processing from a literal perspective, we have to contend with the entire set of possible written language. Because natural language allows ambiguity, that set is too large to explicitly model: there are too many unpredictable details. This is why traditional parsing only works with "context-free grammars" like programming languages.

If we approach language processing from an inference perspective, we only have to deal with the set of what has been written. This is what LLMs do. This method factors out enough entropy to be computable, but factoring out that entropy also means factoring out the explicit definitions language is constructed from. LLMs don't get stuck on ambiguity; but they also don't resolve it.