Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This doesn’t track for me. How can text have lower bandwidth but higher meaning-per-bit? How does that jibe with entropy resistance (in an information theoretic sense)?

Text seems worse to me. First of all, binary encodings are a superset of text encodings. But less abstractly, binary enables content-transparent compression and error correction.

Like other commenters have pointed out, the downside of binary is needing sufficient tooling. Depending on the domain, that can indeed be a downside. But if that critique isn’t relevant for a given context, it’s extremely unlikely that plaintext (ASCII?) is superior.

Text seems more like the answer to a plea for lowest common denominator of tooling.



Human-readability is the ultimate error correction for the most expensive link in the system: the human-in-the-loop.

The information-theoretic justification is that binary's efficiency assumes a perfectly known codec, but the entropy of time destroys codecs (bit rot/obsolescence). Text sacrifices transmission efficiency for semantic recovery - it remains decodable even when the specific tooling is lost, making it the most robust encoding for long-term information survival.


Human-readability isn't a feature of ASCII though. It's a feature of any encoding for which the user has sufficient tooling. Sure, that's an easier bar to clear for ASCII than for binary formats in general. But as I said, as long as you have the tooling, binary is no less readable. (Also, many binary formats will store strings as ASCII or UTF-8, so you can use the strings utility or whatever you want against them.)

> the entropy of time destroys codecs (bit rot/obsolescence)

Okay, so you don't mean "entropy" in an information theoretic sense. You're just talking about the decay of time. That's a much more specific claim than your original one, and I grant than that may be true for some use-cases. But you don't need semantic recovery if you don't need to do recovery at all, i.e. if your data format and/or storage medium transparently provide redundancy and/or versioning.


> it remains decodable even when the specific tooling is lost, making it the most robust encoding for long-term information survival.

This may be true if you mean text written on a physical medium (especially if it's engraved in stone or clay), but it's not true at all if you mean text stored in a computer medium. Text is just binary with a dedicated codec. Good luck interpreting Chinese plain text files after humanity has forgotten about Unicode and UTF-8.

While text-based representations may be easier to decipher than random binary data even without knowing the encoding (as in an archeological setting), it's hardly going to be the easiest. Bitmaps, for example, have a much more limited set of symbols than Unicode, so I'd bet it would be much easier to display a long lost .bmp file than a random .txt file even a few hundred years from now. Same goes for raw audio, too. Now, JPEG and MP3 might be much more difficult, because the encoding is doing much more work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: