The main problem with "illegal numbers" seems to be that for any illegal number ...

a3_nm · on Oct 29, 2012

As a general rule, using secret sharing protocols, for any choice of n and k, you can split any illegal piece of information a into n pieces such that k of these pieces together can reconstitute a but <k pieces will not give any information about a. It is hard to tell to what extent an individual piece is illegal or not.

Related: http://www.madore.org/~david/misc/freespeech.html (does something similar using XOR, and discusses interesting uses of that kind of system).

wmf · on Oct 29, 2012

This is covered in "color", already posted in this thread.

001sky · on Oct 29, 2012

This is a good point, as it the derivative notion: are my thoughts about illegal numbers, themselves, illegal. Evidentiary issues aside. etc.

Evbn · on Oct 29, 2012

Give us an actual example of b and c that are created without knowledge of a, and you may have case. But information theory says it is unlikely, if a is deeply original and not derivative of something like b and c to begin with.

aes256 · on Oct 29, 2012

I may not be on the same wavelength as rolux here, but compression would be one example. Create B (a series of compressed files) and C (a decompressor), both unique creations in their own right, that when combined create the original work of another person, A.

Now I'm no expert on compression, but there are surely a vast, if not infinite, number of possible pairs B and C that could combine to form the original copyrighted work.

Thus, with copyright law in its current state, we are not simply granting copyright holders the exclusive rights to a particular sequencing of 1s and 0s, but also rights over any method of creating that sequence.

If there are an infinite number of methods of creating that sequence — that is to say, any B, if combined with a suitable C could form A — then aren't we in effect granting rights over everything to the copyright holder? Where do we draw the line?

The 1s and 0s that make up this post could, with suitable decompression, form an exact copy of a hit blockbuster, but neither this post nor the decompressor would resemble the blockbuster on their own.

Edit: Interesting idea time. If I uploaded a series of files alleging they are encrypted copies of blockbuster films, but resolved not to release the encryption keys to any of the files for 12 months, would the copyright holders have the right to have the encrypted files taken down in the meantime?

They can't actually prove that the files are infringing without the encryption key. Is the mere suggestion that a file may potentially, with some manipulation (i.e. decryption with the appropriate key), resemble a copyrighted work, sufficient to have it taken down?

wmf · on Oct 29, 2012

Again, that can only work if either B or C is a derivative of A.

aes256 · on Oct 29, 2012

Apparently what I was describing here has already been put into practice in the shape of Monolith [1]

In the case I describe, as in the case of 'munging' using Monolith, neither B nor C bears any resemblance to A except when the two are combined.

In my view, at least, you cannot say either is a derivative of A. To do so would bind you to declare everything a derivative of A, because any B when combined with an appropriate C can form A.

If I take your post here as B, you will likely deny it is a derivative of any copyrighted work, but if I make the text of the post the encryption key to another file (C) which, when decrypted, becomes copyrighted work A, is your post itself derivative of a copyrighted work?

What makes your post non-derivative and the encrypted file I create derivative? They are both nothing in themselves, and yet a copyrighted work when combined with the other.

[1] http://monolith.sourceforge.net/

Edit: Reading through the article on the color of bits it seems this exact argument prompted the article in the first place. I guess I should finish reading this!

wmf · on Oct 29, 2012

Let me put it a different way. If A = decompress(B), then necessarily B = compress(A), so B is obviously a derivative of A. Introducing xor does not change anything; one of the parts must be a derivative of the original.

aes256 · on Oct 29, 2012

Okay, let me propose an alternative procedure.

I set a series of random number generators going, and with each set of results, I apply randomly generated XOR to create a new sequence of numbers.

I perform this process over and over. Eventually, it produces (give or take a few bits) a copy of an MP3 file of a copyrighted work.

Now, once we've eliminated any procedure of creating B and C that includes A, would you still say one of B or C are derivative of A?

Should my random number generator be banned? Perhaps more importantly, do I acquire copyright to all the files it creates?

I could quite easily create every possible variation of an MP3 file of a given length. Does that mean any musician who, using a different procedure, produces one of these files is infringing on my copyright?

nathan_long · on Oct 29, 2012

>> Eventually, it produces (give or take a few bits) a copy of an MP3 file of a copyrighted work.

...which you recognize by having a copy of that work and specifically matching for it. If I were a copyright lawyer, I'd argue that your algorithm for plucking this value out of the stream of randomness was the infringement.

>> I could quite easily create every possible variation of an MP3 file of a given length.

If you're prepared to pay $35 each to register the copyright on all of those, knock yourself out. I'll enjoy not paying taxes anymore.

aes256 · on Oct 29, 2012

> ...which you recognize by having a copy of that work and specifically matching for it. If I were a copyright lawyer, I'd argue that your algorithm for plucking this value out of the stream of randomness was the infringement.

I don't have to recognize it myself. Say I put all the resulting files up for download on an FTP server, and the RIAA stumble across the collection. Within, say, a collection of every possible 30 second long MP3 file encoded at 128kbps, I'd probably be infringing on a few thousand copyrighted works.

For each infringement there'd be many, many more 'infringing' files (i.e. every slight variation on a work that a copyright lawyer would deem indistinguishable from the original work)

> If you're prepared to pay $35 each to register the copyright on all of those, knock yourself out. I'll enjoy not paying taxes anymore.

Apparently you can register copyright for music tracks in bulk. In any case, where I live you don't have to register copyrights.

nathan_long · on Oct 29, 2012

>> Say I put all the resulting files up for download on an FTP server

128kilobits per second * 30 seconds = (128 * 1000 * 30) = 3,840,000 bits per file.

There are 2 to the 3,840,000 possible combinations of that many bits. Ignoring the fact that many of those won't be valid mp3s, each of those is about a 0.46 megabyte file.

I'm guessing you don't have enough hard drive space to put all those mp3s up. :)

Assuming you did, the RIAA would have a tough time crawling all that content for infringement.

It would make an interesting test for the theory that "linking isn't infringing," since the link would be the only thing distinguishing a song from random noise.

aes256 · on Oct 29, 2012

Obviously I'd set the random generator up such that it operates within the rules of the mp3 specification and only creates valid mp3 files; I don't think that detracts from the experiment.

Storage space is the only major limitation here. With current computing power I could easily have random mp3 files spat out at an alarming rate, such that it wouldn't take too long (I'm guessing a matter of months) until I managed to produce an infringing file this way.

I could probably speed the process up by teaching the 'random' mp3 generator certain patterns to pursue; fade-ins and fade-outs, repetition, etc. Again, I don't think these detract from the substance of the experiment.

It's kind of like teaching someone to play a sport; you show them the rules of the game, and a bunch of 'patterns' that players tend to adhere to. Eventually, they'll make a sequence of movements, lasting 30 seconds or so, near enough identical to that performed by a famous sports star.

nathan_long · on Oct 30, 2012

OK, I got a little help for my sorry math skills. http://math.stackexchange.com/questions/225155/how-can-i-qua...

According to Ross Millikan over there, the number of possible 3,840,000-bit files is a number with more than a million zeros. The number of atoms on the universe is only around 10 to the 80. So if you could use the entire universe as your hard drive, storing a bit on every atom, you'd need many, many universes to store those files.

You're going to have to use some serious algorithmic bias to get mp3s, much more bias to get non-static, much more to get anything resembling music and containing any English words, etc etc.

Bluntly, you won't get copyrighted works ever unless you're specifically targeting them, for any reasonable value of ever. It's theoretically possible only in the sense that it's possible for someone's DNA to spontaneously appear at a crime scene.

This is why the "songs are just numbers" argument is misguided. Yes, they can be represented as numbers. But you'd never discover them that way.

Dylan16807 · on Oct 29, 2012

All you've done here is take the 'hidden in pi' argument and customized it. If you are only making a sample of random mp3s you have a statistically zero chance of infringing on anything. If you make a thorough set of X-length mp3s then the actual infringement is in the url, because a datastore that has every number is equivalent to one that has no numbers--it's really just an encryption algorithm between url data and mp3 data.

randomdata · on Oct 29, 2012

If (b) and (c) can be considered derivative works, then should a low-bit hash of (a) not also be considered a derivative? If a given hashing algorithm says the hash of Metallica's latest hit is 42, have I just broken the law by posting this?

Dylan16807 · on Oct 29, 2012

It's not just a yes/no on derivation, the amount of information present in the derivative data matters. If you hashed every 32 bits individually then you would be infringing, but one hash for the whole thing is fine.

lmm · on Oct 29, 2012

I've read an argument that the law has considered a five(?)-note sequence enough to identify one song as a derivative of another - and given reasonable estimates for the "space" of possible 3-minute tunes, and the number of (copyrighted) songs currently in existence, there's a significant chance that a "random" new song will be legally considered a derivative of some existing song.

Of course, here we come back to the "color" of bits; one is unlikely to be prosecuted for "copying" a song one has never heard. Though cases like My Sweet Lord are enough to give one pause.