πfs – A data-free filesystem

adamgordonbell · on June 16, 2023

I had an idea for a file transfer system based on digits of pi. You'd give everybody DVDs of digits of pi (or they could calculate them themselves) , and then transfer files faster by just sending them the offset into pi.

At the time I thought it could work with a big enough bank of digits of PI on both sides. If transfer was expensive, and calculating digits was cheap then you could give everyone an infinite supply of digits of pi and have a nearly infinite compression system.

I discovered that often the offset into pi is much larger than the data you are sending. Turns out it's an expensive way to sent things.

Also, it turns out that this area was already well understood. There are no free lunches with entropy.

But it was a fun idea to kick around.

SillyUsername · on June 16, 2023

I think a lot of people had similar ideas, I know I did.

The idea of using Pi is essentially the same as having a shared dictionary as used in both regular compression and in lossy as waveforms (in layman's, terms - technically it's a best match) both ends have the dictionary and you simply index it.

In fact whole network protocols use this concept too such as protocolbuffers which are part of grpc.

The difference here being the dictionary is infinite and until we get quantum computers on the desktop indexing into pi will always be slow. Some of the address space may not be feasible too, e.g. your file may require a billion bit address/index.

It may still be feasible to do partial matches for a file, 50% one index, 50% another for performance improvement.

I guess the trade off is a balance between performance and number of indexes Vs the original file length, and where in Pi the address space becomes unfeasible.

adamgordonbell · on June 17, 2023

It's interesting how zstd can use a shared dictionary to compress small amounts of data.

It's a more practical ( then my idea ) shared dictionary.

wahern · on June 17, 2023

zlib also supports dictionaries, as does the deflate algorithm. The problem is that zlib doesn't provide any tooling to generate an optimal dictionary from a large sample, whereas zstd does provide such a utility.

Dylan16807 · on June 16, 2023

> I discovered that often the offset into pi is much larger than the data you are sending.

Shouldn't the offset be roughly the same size? If you look for a 1000 digit sequence, you need to try about 10^1000 starting points, which makes your offset about 1000 digits.

adamgordonbell · on June 17, 2023

Oh, is that true? Either way, certainly not an efficient compression scheme.

ithinkso · on June 16, 2023

> Also, it turns out that this area was already well understood. There are no free lunches with entropy.

If I ever go mad and I breach the 'crankpot line' this will be it. There is something in entropy that I just can't fully grasp even though I understand (or I can convince myself that I do) Shannon, Kolmogorov etc.

adamgordonbell · on June 17, 2023

I hear the compression tends to attract lots of 'out of the box' thinkers. So you'll have company.

fsckboy · on June 20, 2023

technically "outside the box," meaning "their creativity is expansive, not within the narrow confines of a box".

"out of the box" means works right away, as soon as you take it out of the box, no assembly required.

kevincox · on June 16, 2023

I love the idea of this, it has certainly boggled my mind a handful of times when thinking of content-addressed storage. Obviously it can't work, there is no loophole to infinite space or compression. Entropy is a very real thing. But it did lead me down an interesting train of thought.

> maximise performance, we consider each individual byte of the file separately, and look it up in π.

Ok, so we store a byte and look it up in π. Now we get an offset. The exact offset will depend on the byte of course. But to simplify let's assume that π is "optimal". We will assume that the fist 256 offsets contain the first 256 bytes.

So our offset will be in the range 0-255. Storing our offset will then take 1 byte of storage.

Oh, I have found the problem.

So yes, you can find any data in π. But storing the location of that data will on average take the same amount of space as the data itself.

ignoramous · on June 16, 2023

> I love the idea of this, it has certainly boggled my mind a handful of times when thinking of content-addressed storage

You only need 2 bits 1 and 0 to store every file there can ever be, and then store sequences in file metadata. Call it bifs.

SirMaster · on June 16, 2023

Why are you storing an offset for each byte?

I thought you are supposed to find the offset where your entire file is sequentially there in pi.

Because if it's infinite and not repeating, then that string should be in there somewhere.

kevincox · on June 16, 2023

Because that is what the project does.

But I suspect that the analysis would look very similar for the entire file. Just calculate the average offset of the file of a given size. It won't be smaller than the file itself.

kevincox · on June 16, 2023

I guess that is mostly true. In theory you could think of a more efficient way to store the indexes such as run-length encoding. So you could store 1M of whatever the first byte of Pi is and then run-length encode the 1M zero addresses. You can also imagine a scheme such as 2 bit varint encoding the indexes or a tally system that only uses 1 bit per offset to store a zero.

...of course you are still better off to just compress the file consisting of a single repeated byte.

BugsJustFindMe · on June 16, 2023

> So yes, you can find any data in π. But storing the location of that data will on average take the same amount of space as the data itself.

Not on average. Not even at minimum. Since the leading values of pi are definitely _not_ so compact (and we have yet to identify any contiguous segment that is), the absolute minimum is N + plus the overhead to support the optimization of sometimes only using single-byte offsets by declaring the offset size as a header.

kevincox · on June 16, 2023

I don't think this is true. If I coincidentally want to store the first million of digits of Pi then my offset is 0 which is certainly smaller than the data. So you can get lucky.

BugsJustFindMe · on June 16, 2023

Not here, because this mechanism stores offsets to individual bytes not large blobs. Storing the first N bytes of Pi (it makes more sense to talk about bytes than digits) requires N offsets plus either a header indicating how big each offset needed to be or variable-length delimited offsets, which necessarily likewise increases the offset size. In the absolute smallest case that's N+1 bytes.

myaccount80 · on June 16, 2023

You can encode every book and information in a small rod. Just take a 1 meter long rod and encode your book into binary, eg 01011110111. Now, take your binary string, and transform it into a decimal number in base 2 by prepending 0. , E.g. x=0.01011110111. As this number is finite and smaller than 1, you can just take your rod and cut it at length x. Now when you want to retrieve your information you just need to measure x from your rod with a ruler, take the decimal part and convert it into binary, and voila. You can encode almost infinite amount of data into a simple rod. Assuming you can measure and cut very precisely

capitainenemo · on June 16, 2023

Small problem of how to make cuts at a subatomic level ;)

(edit for pedantism and type of atom) at an optimistic 100 billion carbon atoms in length for a metre stick, the maximum distinct cuts you can make are 100 billion, or 100 gigabytes of info. We do much better these days with thumb drives.

jclarkcom · on June 16, 2023

Not to mention thermal expansion which doesn’t happen evenly and trying to cut or measure would have an effect

capitainenemo · on June 17, 2023

maybe you're measuring by counting atoms with a scanning electron microscope? 'course at that point why not just encode the information in the atom layer directly :)

chrisshroba · on June 16, 2023

That 's only if you can make multiple cuts (or slits), and it's actually 8x less because we're talking about bits, not bytes.

With only one cut as the parent comment describes, you can only store the log base 2 of 100 billion, which is 36, so about 4 bytes of info, or one long integer.

capitainenemo · on June 17, 2023

Ah true. 100 gigabits.. can't edit my comment now. But oh well, it was pretty rough anyway, so "on the close order" ;)

hsfzxjy · on June 16, 2023

Caveat: data with trailing zeros will share the same location, e.g., 0.1 & 0.010 . A quick fix would be appending a 1 to all data.

mal10c · on June 16, 2023

"That's right! Every file you've ever created, or anyone else has created or will create! Copyright infringement? It's just a few digits of π! They were always there!"

I didn't think that was actually mathematically proven yet. Was some proof accepted recently that makes that quoted sentence true?

nivekney · on June 16, 2023

The proof exists in πfs, you just need to know the metadata to retrieve it!

Solvency · on June 16, 2023

But so do all of the lawyers' rebuttals! And I'll bet my money they'll find the metadata for them first...

WillAdams · on June 16, 2023

There was an examination of this sort of thing in the webcomic Freefall --- shortest story is 6 words, 100,000 words for English vocabulary, order matters --- it worked out to a data storage unit the size of a small moon.

teddyh · on June 16, 2023

Links:

<http://freefall.purrsia.com/ff2100/fv02058.htm>

<http://freefall.purrsia.com/ff2100/fv02059.htm>

<http://freefall.purrsia.com/ff2100/fv02060.htm>

WillAdams · on June 18, 2023

Thanks for posting those! I was off by a factor of 10.

ravi-delia · on June 16, 2023

No one has proved that pi is normal, true. But pi is normal. I mean come on! Like just between us, two friends talking, we both know that pi is normal. It'll be a nightmare to prove but I'd put extremely good odds on it. Probably better odds than RSA being secure, which is also not strictly proven but pretty likely.

dang · on June 16, 2023

PiFS – The Data-Free Filesystem - https://news.ycombinator.com/item?id=26208704 - Feb 2021 (1 comment)

Πfs: Never worry about data again - https://news.ycombinator.com/item?id=21359338 - Oct 2019 (1 comment)

The π Filesystem for FUSE: Store Your Data in π - https://news.ycombinator.com/item?id=19223032 - Feb 2019 (1 comment)

pifs - Avoid disk space usage by saving your files in the digits of Pi - https://news.ycombinator.com/item?id=18687275 - Dec 2018 (1 comment)

πfs – A data-free filesystem - https://news.ycombinator.com/item?id=13869691 - March 2017 (105 comments)

Πfs: Stores your data in π - https://news.ycombinator.com/item?id=10856108 - Jan 2016 (1 comment)

Πfs: Never worry about data again - https://news.ycombinator.com/item?id=10847693 - Jan 2016 (1 comment)

File system that stores location of file in Pi - https://news.ycombinator.com/item?id=8018818 - July 2014 (98 comments)

100% Compression Using Pi - https://news.ycombinator.com/item?id=6698852 - Nov 2013 (32 comments)

bmicraft · on June 16, 2023

That's just use he library of babel all over again

http://libraryofbabel.info/

salgorithm · on June 16, 2023

People will do anything nowadays to lower their AWS bill.

warent · on June 16, 2023

  In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

LOL so you get to store your data for free, and all it takes is allocating like 8 bytes for every byte. This project is galaxy brain

timbit42 · on June 17, 2023

You can save space by searching for longer matching sequences.

Slitted · on June 16, 2023

Much like our universe, this project will be ever-expanding too.

therobotking · on June 16, 2023

How am I reading so many comments not realising this repo is tongue-in-cheek?

david422 · on June 16, 2023

Oh, they probably do .. but then there's not much to discuss.

ftxbro · on June 16, 2023

In this quintessential hacker news post and discussion we see:

    eating the onion
    anecdotes of having the same idea
    https://en.wikipedia.org/wiki/Normal_number
    https://en.wikipedia.org/wiki/Pigeonhole_principle

CuriousSkeptic · on June 16, 2023

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

Had me laughing out loud. Priceless!

jakelazaroff · on June 16, 2023

> You'll never run out of space again - π holds every file that could possibly exist!

This isn't necessarily true, right? AFAIK this only holds if pi is normal, which we haven't proven.

ravi-delia · on June 16, 2023

Look we know that pi is normal. You know that pi is normal, I know that pi is normal. I'll give you 10,000:1 odds that we find any string you want with exactly the probability you'd expect from pi being normal, and before long you'll have forked over your life savings. It's annoying to prove, we barely know if any numbers are normal, but the idea of a normal number was invented to discuss what numbers like pi or e seem to be like. In math that's not enough, but this is CS! All encryption is built on weaker grounds than pi being normal.

jakelazaroff · on June 16, 2023

Okay, so I just have to find a sequence that hasn't yet been discovered to exist in pi? Or do I have to wait for a proof that pi isn't normal (if one exists) to get my money?

My sequence is a trillion consecutive 0s, followed by a trillion consecutive 1s, etc, all the way up through a trillion consecutive 9s. Who wins the bet?

ravi-delia · on June 19, 2023

> with exactly the probability you'd expect from pi being normal

In this case you win it when you check enough digits of pi that, if pi were normal, the expected probability of finding your string. Probably not gonna happen, but if you ever do hit me up! If you do that work I'll have the opportunity to win lots of smaller bets

edit: Wait that loses me catastrophic amounts of money. Keep checking until P<1/11,000, I think that's fair.

hackernewds · on June 16, 2023

Thought John Lamhert did that in 1728

https://www.sciencefocus.com/science/how-do-we-know-that-pi-...

constantcrying · on June 16, 2023

No, this proves pi can not be written as a fraction. But not being a fraction does not imply that a number contains all sequences of digits.

dragonwriter · on June 16, 2023

As a fairly simple example, consider the number wirtten in decimal notation as:

0.101001000100001000001... (with an increasing-by-one sequence of zeroes between each successive 1.) It doesn't contain all sequences of digits, or even all digits, yet it cannot be written as a fraction.

constantcrying · on June 16, 2023

>yet it cannot be written as a fraction.

Is that trivial?

xigoi · on June 16, 2023

Yes. It can quite easily (depending on your knowledge of basic number theory) be proven that a number is rational if and only if its digits (in any base) eventually become periodic.

dragonwriter · on June 16, 2023

> Is that trivial?

My understanding is that it is a trivial (or at least well-known) result that a rational will always have a finite, infinitely-repeating terminal sequence in any base.

r3trohack3r · on June 16, 2023

This is a fallacy. Infinity does not contain all possibilities.

There are an infinite number of integers. You can start at 10 and count up forever, never running out of integers. But no matter how high you count, you’ll never count to “orange” - “orange” is not contained in the sequence of infinite integers.

You’ll need to first prove that every sequence of integers is contained somewhere in pi, since the number of possible integer sequences grows faster than the “space” for sequences in pi. In other words, I can always pick a digit that creates a valid, non repeating, integer sequence from the pool of possible sequences while never creating the integer sequence “123456789123456789123456789.” You’d need to prove that pi doesn’t do this.

Even if pi does contain every sequence of integers and you could map that to bytes which, in turn, maps to a file, this would not compress.

Your metadata directory would be larger than the raw files unless you get very lucky and your file is very early in the sequence of pi.

A byte can represent 256 unique values. 256 unique values can not compress to less than a byte. So if your index is a digit of pi where your file starts, your file starts after some other number of files. Your index is going to be the index inside of the address space of “all possible files.” This will get large very quickly.

maweaver · on June 16, 2023

> This is a fallacy. Infinity does not contain all possibilities.

From the link: One of the properties that π is conjectured to have is that it is normal, which is to say that its digits are all distributed evenly, with the implication that it is a disjunctive sequence, meaning that all possible finite sequences of digits will be present somewhere in it. If we consider π in base 16 (hexadecimal) , it is trivial to see that if this conjecture is true, then all possible finite files must exist within π. The first record of this observation dates back to 2001.

> Your metadata directory would be larger than the raw files unless you get very lucky and your file is very early in the sequence of pi.

This is very very clearly a tongue in cheek project and not intended as a practical way to store files

oneshtein · on June 16, 2023

> One of the properties that π is conjectured to have is that it is normal, which is to say that its digits are all distributed evenly

.(0123456789) has this property too.

tromp · on June 16, 2023

Being normal requires not only the correct frequencies of single digit substrings, but of k-digit substrings as well, for all k. One number that is normal to base 10 is Champernowne's number [1].

[1] https://en.wikipedia.org/wiki/Champernowne_constant

jakelazaroff · on June 16, 2023

.(0123456789) is rational, though. I think the implication is that if pi is normal in addition to its other properties — specifically irrationality — that it has to contain all possible digit sequences.

Edit: oops, I forgot about n-length sequences of digits, .(0123456789) is definitely not normal, this is why I’m not a mathematician.

oneshtein · on June 16, 2023

Pi may contain any sequence of digits but it's not proven that it contains all sequences of digits. IMHO, it's possible to construct a sequence of digits which will not be in Pi. It's easy to construct such sequence for first N digits of Pi of any length (for example, N zeroes).

Dylan16807 · on June 16, 2023

Constructing things for N digit subsets can be misleadingly easy.

It's easy to construct a sequence of digits that is larger than any N digit number. But it's obviously impossible for any such number to be the largest number.

jakelazaroff · on June 16, 2023

We’re talking about a hypothetical scenario in which pi is normal, though.

LodeOfCode · on June 16, 2023

This reasoning is true of every real number, yet it's been proven that almost all real numbers are absolutely normal and therefore contains every finite sequence of digits

renewiltord · on June 16, 2023

It's a slight mistatement of what a normal number is that changes it significantly. It isn't that each digit is evenly distributed, but that for every n, every string of digits n long is evenly distributed.

constantcrying · on June 16, 2023

>This is a fallacy.

It is not, since the readme considers it an (unproven) working assumption.

>This will get large very quickly.

What? I can not believe that this joke repo, which even explains that saving a few hundred bytes takes minutes, is not in fact practical.

terlisimo · on June 16, 2023

> you’ll never count to “orange” - “orange”

Ok but I don't see how that relates to the idea of storing sequences of numbers (the basic function of a filesystem) in Pi. Orange is not a number so it doesn't make sense to look for it in any sequence of numbers. "Orange" can however be represented as a sequence of ASCII characters, and if every sequence of numbers is contained in Pi, then an ascii representation of any string can also be found.

> This will get large very quickly.

Right, but here's a thought...

Suppose the place in Pi where your sequence is located is huge. You could maybe (probably) find a sequence earlier in Pi that is a pointer to your actual location :) So the metadata would need like 3 things to make this work. 1) length of your data, 2) checksum of your data and 3) pointer into Pi. If hash of (pointer+length) doesn't match checksum, it's a pointer (which may point to a different pointer etc).

That might make it compress a bit better. Although you may need huge amount of RAM just to dereference the pointers.

r3trohack3r · on June 16, 2023

> Suppose the place in Pi where your sequence is located is huge. You could maybe (probably) find a sequence earlier in Pi that is a pointer to your actual location

If the sequence of bytes necessary to represent the file encodes to an integer index into pi, and that integer index encodes to a sequence of bytes larger than the original file, I have no reason to believe trying to repeat that process would result in anything other than an even larger integer index.

some_random · on June 16, 2023

This is such a hackernews comment lmao, storing all your files inside the decimal representation of pi is obviously a joke project.

lxgr · on June 16, 2023

> But no matter how high you count, you’ll never count to “orange” - “orange” is not contained in the sequence of infinite integers.

Well, certainly not with that attitude!

Meanwhile, in my highly efficient, proprietary fruit string compression scheme, "orange" is represented by the integer 8. (Don't confuse it with "blood orange" though; that's 26.)

Onavo · on June 16, 2023

My prime is less legal than your irrationals

https://en.m.wikipedia.org/wiki/Illegal_number

I want my ShorFS where data is stored as factors.

jermaustin1 · on June 16, 2023

> you’ll never count to “orange” - “orange” is not contained in the sequence of infinite integers.

What is a string, but a sequence of bytes? What is a sequence of bytes, but a decomposed integer?

"orange" = 111,494,907,916,911

r3trohack3r · on June 16, 2023

No. Encoding is not a clever way around infinity.

You can encode orange in RGB and HSL too, but the set of all integers still does not contain the concept of an orange. You’ve just assigned meaning to an integer.

In the same way you can’t count to orange, you can never start at 1 and count up to -1. There are an infinite number of different integers greater than 1, and that infinity does not contain -1.

SciFi likes abusing this. Just because there are an infinite number of universes doesn’t mean there is a universe that contains anything you can dream up Rick and Morty style.

mynameisvlad · on June 16, 2023

> You can encode orange in RGB and HSL too, but the set of all integers still does not contain the concept of an orange.

Nobody claimed "the concept of an orange" can be found. The claim that the word "orange" can't be found, however, is provably false.

If we're going by this logic, then nothing but 0 and 1 can ever be stored on any medium, because a hard drive can't possibly store "the concept of an orange" either.

> Just because there are an infinite number of universes doesn’t mean there is a universe that contains anything you can dream up Rick and Morty style.

And you know this with certainty.... how? With enough branches in a truly infinite timeline, the likelihood of there being a combination which produces a specific outcome is quite high.

r3trohack3r · on June 16, 2023

> And you know this with certainty.... how? With enough branches in a truly infinite timeline, the likelihood of there being a combination which produces a specific outcome is quite high.

It is not. It is quite low. The number of possible states grows at a faster rate than the rate necessary to maintain “infinity.” It’s only true if you assert that all possible states at a branch are added to the set, which is a tautology.

A fun puzzle. Let’s say you have an empty set whose members are sets. You add a set to it of infinite size (like the set of all positive integers). Next, you add another set to it whose size is also infinite but whose members are not contained in the first set. Now you repeat this process an infinite number of times.

You have an infinite set of infinite sets. The question is: is such a set possible and, if so, does your infinite set of infinite sets necessarily contain all possible infinite sets?

ravi-delia · on June 16, 2023

Most pitches for Everett branches (which are a specific enough concept that it's silly to get all technical, but whatever) hold that there is an amplitude distribution over the entire configuration space of the universe. While the proposed mechanism of decoherence implies a sort of discreteness, every single possible configuration would have non-zero amplitude. Of course there's no specific evidence for this over, say, collapse theories (or something stranger, like Bohmian mechanics), but there's certainly no philosophical issue. Not sure what you think ZFC has to do with physics, but an empty set with members is a tad nonexistent.

mynameisvlad · on June 16, 2023

> It’s only true if you assert that all possible states at a branch are added to the set

That is generally how "multiverses"/branching timelines are handled, yes.

lxgr · on June 16, 2023

> the set of all integers still does not contain the concept of an orange.

Nor can any file system store the "concept of an orange".

File systems are really just pointers to numbers, if you think about it.

mynameisvlad · on June 16, 2023

Did you reply to the wrong person? I actually said just as much in my comment.

> If we're going by this logic, then nothing but 0 and 1 can ever be stored on any medium, because a hard drive can't possibly store "the concept of an orange" either.

lxgr · on June 16, 2023

Ah yes, sorry! This should have gone one level up.

truculent · on June 16, 2023

What if I created a new system of encoding words, where the digit "1" represents the word "orange"? Does that disprove the claim, just as encoding it with bytes does?

lxgr · on June 17, 2023

> the set of all integers still does not contain the concept of an orange. You’ve just assigned meaning to an integer.

We all do, all day, every day. Isn’t that exactly what computer science and mathematics is?

> Just because there are an infinite number of universes doesn’t mean there is a universe that contains anything you can dream up Rick and Morty style.

Sure, there are many different kinds of infinities, but what does that have to do with pi and file systems?

jakelazaroff · on June 16, 2023

What if a digit (say, 9) only appears in pi up to a certain decimal point? If your sequence contains that digit and doesn't appear by that point, pi won't contain your sequence.

colejohnson66 · on June 16, 2023

If pi is indeed normal (whether proven or not), that would never happen. A normal number, by definition, contains an even distribution of digits. There would be no "last 9" because there would have to be more after to satisfy the distribution. Basically, the count of all nines must be the same as the nine other numbers, as the number of digits goes to infinity.

Even if the first million digits of some unnamed number were all nines (with no more afterward), it would not be normal because the distribution of nines, as the number of digits goes to infinity, would go down.

jakelazaroff · on June 16, 2023

Right, the point is that we don’t know whether pi is normal.

zellyn · on June 16, 2023

According to https://math.stackexchange.com/a/216578, “𝜋 is not known to have this property, but it is expected to be true.”

constantcrying · on June 16, 2023

The repo takes it as a working assumption that this property is indeed true.

imtringued · on June 16, 2023

Come on the length of the index into Pi is anywhere between 0 and infinity. I don't think anyone feels defrauded. If anything this is a gambler's dream!

agmm · on June 16, 2023

If π is a normal number, then it contains all possible sequences. However, it is currently unknown whether π is a normal number. I agree with you regarding the fact that this might not be a great compression algorithm, but it is certainly fun.

jakelazaroff · on June 16, 2023

> since the number of possible integer sequences grows faster than the “space” for sequences in pi

Is this true? Aren't the digits of pi and the number of possible (finite) integer sequences both countable?

MithrilTuxedo · on June 16, 2023

Pi is a function in reality, not a number.

jakelazaroff · on June 16, 2023

Pi is a number. You can approximate it to an arbitrary degree of precision by computing a function, but the map is not the territory.

paradite · on June 16, 2023

I tried something similar, finding all the integers in pi:

https://pi.paradite.com/

lxgr · on June 17, 2023

> Even if pi does contain every sequence of integers and you could map that to bytes which, in turn, maps to a file, this would not compress. [...] This will get large very quickly.

Psst... That was the joke :)

pyrrhotech · on June 16, 2023

the set of reals is larger than the set of integers despite both being infinitely large

jazzsax · on June 16, 2023

If I were to dumb this down (so I can understand it), is this a fair analogy to the adage "give 1 million monkeys 1 million typewriters and they'll eventually type the entire works of William Shakespeare"?

Somewhere in pi (at some insane offset) is the entire work of William Shakespeare.

Is that the basic idea here?

paddim8 · on June 16, 2023

I don't think an infinite amount of monkeys with an infinite amount of time are guaranteed to write a work of Shakespeare. If they acted completely randomly it would be, but they don't. They're monkeys. They follow certain behavioural patterns. Just because they have an infinite amount of time doesn't mean they're going to try everything there is to try.

sn0wf1re · on June 16, 2023

It's a theorem. And yes, it requires the monkeys to be random. And yes, it was tested with monkeys and the results were mainly the letter 's' repeated.

https://en.wikipedia.org/wiki/Infinite_monkey_theorem

brookst · on June 16, 2023

Really all this proves is that if you had infinite Shakespeares and infinite time, one of them would write his complete works using only the letter 's'.

mwcremer · on June 16, 2023

As you note, the real issue here is that the problem is poorly framed. Let's instead consider spherical monkeys in a vacuum...

oneshtein · on June 16, 2023

I have random number generator which produces exact copy of a work of Shakespeare every time when I call it. '-)

jadbox · on June 16, 2023

Technically 1 million monkeys with 1 million typewriters could type for a million years and yet still not produce a work of William Shakespeare. In theory (as I understand it), pi at some index contains all permutations of sets of values. I'm not a mathematician and only know this by hearsay.

LoganDark · on June 17, 2023

> Technically 1 million monkeys with 1 million typewriters could type for a million years and yet still not produce a work of William Shakespeare

You have to calculate the probability of a monkey coming up with a work of Shakespeare, and then subtract that from 1 to get the probability of a monkey coming up with anything other than a work of Shakespeare, and then figure out what power N to raise that to (N attempts) for it to cross below some confidence value, like 0.05 ... then subtract that from 1 in order to get the probability of a monkey coming up with a work of Shakespeare after that many attempts.

(no matter how high N is, you will never get 0, so no matter how many times you attempt, you always "could" still not produce a work of Shakespeare, because there's always a chance that none of those attempts worked)

Then calculate how much time that might take (actually, you might even have to account for failed attempts that are shorter or longer than a work of Shakespeare, so you might even have to start with time rather than number of attempts, but I digress) and you have a possible figure for "how much time it would take to have a certain probability of a monkey having come up with a work of shakespeare" and oh dear I may have missed the joke

j_walter · on June 16, 2023

Bingo

miahwilde · on June 16, 2023

Wake me up when it's self-bootstrapped. This repo should contain the metadata for the source code of itself in pi. Or at the very least the metadata for "Everything can be stored in pie" in pi.

LoganDark · on June 17, 2023

> Everything can be stored in pie

Isn't this where the old meme of a hacksaw inside a cake came from?

based2 · on June 17, 2023

https://en.wikipedia.org/wiki/The_Library_of_Babel

jsdeveloper · on June 16, 2023

I personally thought this during my college days. Never pursued this idea but kept it my mind as a way to compress files and share just index to first meta data, which will then link to next and next and so on.

b33j0r · on June 16, 2023

I irrationally think that I go to too much effort to make a point in our field with over-the-top clever stuff.

Nope, I’m undertraining. Get ready for TauOS.

bzmrgonz · on June 16, 2023

So Pi=akashic records??

pk-protect-ai · on June 16, 2023

lmao :) Love it!!!!