Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Converting Pi to binary: Don't do it (everything2.com)
34 points by olliesaunders on Aug 6, 2010 | hide | past | favorite | 35 comments


Hilarious. :)

This is an "illusion" of sorts that works by ignoring contextual information. Sure, all that stuff is in pi, but it takes intelligence for someone to extract the right number of bits starting at the right position and decide to put that in a certain context where it makes sense (SSN, executable machine code, JPEG, etc.).

For example, the UTF-8 representation of this comment is somewhere in pi, but you can't say that the full information of this comment is in pi. Part of the information lies outside of pi, in the person who finds that part of pi and chooses to decode it as UTF-8.


And it directly follows that you can encode any document as 2 numbers: (start-index, length) in the Pi representation. Well, this is also a bad idea since you can always treat the entire document as one binary number...


You don't even need length because there must be an instance of the document in pi that is already prefixed by the correct length, so all you really need is the offset of the first occurrence of that variation.


Did you know that absolutely anything you can imagine is encrypted with an OTP in the first N digits of pi, where N is the length of your information?


Pity this sort of OTP is actually unbreakable.

It's because you can find absolutely anything there that you can never be sure that you've found what you're looking for.


It's a shame that such an index-based encoding scheme will use more bits to store the index than the size of the document itself.


This sort of discussion often leads to someone wondering why it can't be used as a compression scheme, especially when you can calculate hexadecimal digits of pi without calculating any of the previous digits.

The first problem with that is finding the desired content somewhere in the number. You can't just Google it. In fact, I know of no way to search for it except for brute force (i.e. continue to calculate pi until such time as you find it). Because the odds are so low for any reasonably sized file, this is going to take a long time. Even if you measure using a scale where one unit is the current age of the universe, you would find the scale to be too small.

Even if someone solves that problem, the other problem is file size. You have to store offset and length. The length for most files would be reasonable, the problem is the offset. In all probability, that number is larger than your original file. Much larger. There are exceptions, of course, but not enough of them to help.

That said, I wonder how long it will be before some mathematician starts writing theorems about the "set of illegal numbers" like the 09F9 number and the rest.


There's bound to be an exponential form for the offset which would take up a reasonable amount of space. eg: 2^100^100 takes up just 9 bytes, but if I wrote it out longhand it would be over 3000 bytes. Remember that it doesn't have to be human readable either; the 2 could be assumed and the 100's could be \x64\x64, just two bytes. In that form, you have a compact way to express large numbers, and any offset could be expressed as a sum of some such numbers.

I think the real problem, as you said, is having a copy of the full sequence around for encoding and decoding, and finding any given string of bits within it.


I would imagine that there is no exponential form that would take up less space in all cases than the original offset itself. You could think of the "exponential form" as a sort of compression algorithm with the same pitfalls as the latter, some exponential forms will end up taking more space than the original offset.

Furthermore, I would guess the offset has even less entropy on average than the bitsequence you're hoping to compress, so if you had an algorithm that would shorten the expression of the offset you might very well be able to apply it to the original bitsequence with the same or better results.

I'd think an information theory expert could probably tackle these sorts of questions very rigorously, but I am not one so this is mostly conjecture.


You are absolutely right, and the argument is the same as why there cannot be a compression algorithm that compresses something without making something else larger.

Namely that there are 2^n possible signals with n bits, so if a compression algorithm compresses some signal that is larger than n bits down to n bits or less, then there aren't enough possibilities left for all possible signals of at most n to also be encoded with n bits or less.


Indeed he is correct. But I would like to note that you can cap how much space you waste on incompressible things, though. Just store a flag with it that indicates whether that chunk of data is incompressible and don't even try to compress the incompressible things. The decoder for that is simple: read the flag (which only has to be 1 bit) and decompress it if it was compressed, or just return the unchanged data otherwise.

However, you'll still find that there are still limits to exactly how much you can compress things. And you'll find that, sometimes when you think you've found something that looks like it should be able to compress things down to nothing, that you've just been hiding the data in your decompression program.

In a sense, when you consider special-purpose compression and decompression functions, it's not unlike how "RETR some_huge_file.rar" sent to an FTP program will "decompress" that tiny string into some multi-GB file. But that only works because the program already has a copy of the data.


The problem is information content. 2^100^100 is a special case: what if I want 2^100^100+1? or (2 * 10^100) + (1 * 10^99) + (3 * 10^98) + (7 * 10^97)... it would be much shorter to write that in its regular decimal form 2,137,... So to specify a sequence, you would (on average, not for special cases like the one you picked) need just as much information as there is in the sequence!


That's funny, but it reflects a poor understanding of probability.

Figure as an upper bound that there are 10^90 particles in the universe. Now imagine that you could turn the entire universe into such an efficient computer that every particle stored a bit of information, and you used it to calculate pi.

The probability of finding an arbitrary binary string of length N in that pi is (1/2^N)*10^90. Set to .5, solve for N, and you get . . .

388 bits. You have a 50/50 probability of finding an arbitrary 388 bit string somewhere in there. For reference, that's about 50 characters in ASCII. Maybe enough for a threat against the president. CD cracking software seems more dubious.

When dealing with brute force probability, the numbers get big very, very fast. Don't assume that just because they seem big to you means they're big enough to solve the problem you're thinking about. Do the math.


This reminds me of a python script me and some friends wrote that would generate all the possible images of a given arbitrary size :)


Did you ever get past the first 10 pixels?


it is worth noting that pi also contains every possible convincing lie, along with photographic evidence.

(don't trust pi!)


It could be fun if someone made an analysis of the more interesting strings that can be extracted from the currently calculated trillions of digits of pi.

I wonder if we can even extract a full sentence that makes sense out of the current digits that we have.


I just realized that--if expanded to enough digits--the alphabet clock contains every possible finite English sentence (without punctuation or spaces). And this happens at every moment.

http://alphabetclock.com


To be fair, this would be true of base-10 pi as well..


I always wondered how binary conversions worked with a floating point.



Someone should write a Sci Fi story about a research facility where every possible resource is dedicated to exploring strings of bits of pi. In the process they could discover a lot of things like

1)We are living in an illusion and are simulated (not like matrix, as in a real program) on the computer of a geeky alien in his basement (or equivalent in his world).

2)A method is discovered to capture bit strings which are part of real world and thus creates a computer which knows all the secrets.

3)That we are not the makers of our own destinies. Pi holds our past present and future.

Man this is exciting exploration into science fiction.


This is more-or-less the plot of Borges' "The Library of Babel," except he doesn't talk about binary, computers, or pi.

Instead the Library is a place where you have books that contain the every string of symbols.

http://jubal.westnet.com/hyperdiscordia/library_of_babel.htm...


In Carl Sagan's 'Contact', "Ellie, acting upon a suggestion by the senders of the message, works on a program which computes the digits of π to record lengths and in different bases. Very, very far from the decimal point (1020) and in base 11, it finds that a special pattern does exist when the numbers stop varying randomly and start producing 1s and 0s in a very long string. The string's length is the product of 11 prime numbers. The 1s and 0s when organized as a square of specific dimensions form a rasterized circle."


Haha, you can't make a square with a number of elements equal to the product of 11 different primes (or any number of primes).


Given that it was a "square of specific dimensions," I just assumed that it was supposed to be a rectangle.


It wouldn’t work. Say you are looking for a recipe for apple pie. You will be able to find a infinite number of recipes for apple pie, which do you pick? Not only that, there will be an infinite number of recipes with the headline ”Recipe for Apple Pie” which are, in fact, not recipes for apple pie.

Still not convinced that mining Pi for usable data would be a bad idea? How about this: You will be able to find those recipes not only once, you will find them several times. As ASCII encoded bytes, as PDF document, as DOC document, as JPEG, as iPhone and Android app (different versions for every version of iOS and Android that there will ever be). And you don’t have to stop there, you can just continue on. Just think of the millions of versions of the recipes with a few typos.


Just introduce two things

1)A genius mathematicians.

2)A theorem to extract contextual information from Pi bits. It would give not only the info, but also work out the context.


Why not a lone computer hacker who discovers his own destiny written in pi? Or discovers the corruption of the evil government, or something along those lines?


Or the infinite amount of strings that sound just like his destiny but are, in fact, not it?


I can't decide what would make a better thriller- decoding his actual destiny, or trying to figure out which one is the real destiny


I don’t think that hacker has thought that through. Sure, he will find an infinite number of biographies which tell the story of his life up to the present with unbelievable accuracy, he, however, has no way of knowing which one of them actually predicts the future. The information just isn’t there. Oh, sure, among the infinite number of biographies there will be one which is correct, but it doesn’t come with a “authentic biography” meta tag (well, actually there will be a version that does, but so will the fakes).


Carl Sagan: Contact. Ellie found an 11 dimensional image of a circle.


Sagan does something like this in Contact (the book).


You just ruined my evil plans to rule the world

Yes, the plans involved Pi.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: