This doesn’t track for me. How can text have lower bandwidth but higher meaning-...

stevenjgarner · 2025-12-27T17:28:05 1766856485

Human-readability is the ultimate error correction for the most expensive link in the system: the human-in-the-loop.

The information-theoretic justification is that binary's efficiency assumes a perfectly known codec, but the entropy of time destroys codecs (bit rot/obsolescence). Text sacrifices transmission efficiency for semantic recovery - it remains decodable even when the specific tooling is lost, making it the most robust encoding for long-term information survival.

jcgl · 2025-12-27T18:05:37 1766858737

Human-readability isn't a feature of ASCII though. It's a feature of any encoding for which the user has sufficient tooling. Sure, that's an easier bar to clear for ASCII than for binary formats in general. But as I said, as long as you have the tooling, binary is no less readable. (Also, many binary formats will store strings as ASCII or UTF-8, so you can use the strings utility or whatever you want against them.)

> the entropy of time destroys codecs (bit rot/obsolescence)

Okay, so you don't mean "entropy" in an information theoretic sense. You're just talking about the decay of time. That's a much more specific claim than your original one, and I grant than that may be true for some use-cases. But you don't need semantic recovery if you don't need to do recovery at all, i.e. if your data format and/or storage medium transparently provide redundancy and/or versioning.

tsimionescu · 2025-12-29T07:32:08 1766993528

> it remains decodable even when the specific tooling is lost, making it the most robust encoding for long-term information survival.

This may be true if you mean text written on a physical medium (especially if it's engraved in stone or clay), but it's not true at all if you mean text stored in a computer medium. Text is just binary with a dedicated codec. Good luck interpreting Chinese plain text files after humanity has forgotten about Unicode and UTF-8.

While text-based representations may be easier to decipher than random binary data even without knowing the encoding (as in an archeological setting), it's hardly going to be the easiest. Bitmaps, for example, have a much more limited set of symbols than Unicode, so I'd bet it would be much easier to display a long lost .bmp file than a random .txt file even a few hundred years from now. Same goes for raw audio, too. Now, JPEG and MP3 might be much more difficult, because the encoding is doing much more work.