Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>As submarines do not swim, LLM's do not train:

The neural network behind the LLM is "trained." Yes, it's a term of art, very observant. That doesn't change the fact that there's not a substantial reproduction of a work in it. LLMs "learn" to predict text by feeding them vectors, the resulting "weights" of that process could be thought of and implemented as a big walkable decision tree to predict the next token it weren't for combinatorial explosion, but there's no substantial reproduction, or "copy", in there.

Is there a substantial reproduction, in your head, of a song or a text or a painting you can perfectly recall and can't seem to forget? Be careful, the copyright police will get you, the only way to delete it is to delete yourself!

>feeding into a machine is called copying

Copyright law deals in "reproductions" as far as I understand it. Feeding a copyrighted document as vectors into a program that reconfigures some matrix of vectors, and then using that matrices to output probability distributions into something tangible, where something tangible isn't a complete or substantial reproduction of the copyrighted work, is not a copyright violation.

If you think it is, amend copyright law, or take it to court and let a judge settle the matter, and hope it's in your favor.



How is the copyrighted document being "fed as vectors" if not through a process of copy of the data that it contains? Indeed the semantics are surely going to get their day in court. The fact that you felt the need to qualify reproduction with the word "substantial" probably means that copyright will show how much "substance" is allowed to be, or not to be, copied.

About your question: """Is there a substantial reproduction, in your head, of a song or a text or a painting you can perfectly recall and can't seem to forget? Be careful, the copyright police will get you, the only way to delete it is to delete yourself!"""

This equivalence between "perfect" human recall, and a copy of the data input into an AI model is a bit of a strawman I think: it is the distribution of copied information that copyright protects against, the information has been uploaded in vectors, have fun demonstrating it is not being used in producing this or that model output. If I learn a popular song and do a cover, there is a copyright law for that, I owe rights.


>it is the distribution of copied information that copyright protects against, the information has been uploaded in vectors

The "vectors" are not "uploaded" or "copied" into a file or neural network, they're transformed. In the context of stable diffusion, They're transformed, progressively "noised" or "corrupted", "diffused" with random gaussian noise, and in the context of stable diffusion, it's "trained" and "learns" how to "denoise" various "noised" stages of images represented as a vector of pixel data into their original form.

Then, when it comes to generation of images consistent with an "annotation" or "prompt", it is "conditioned towards" or "biased towards", with more "training", by noising an input image, and concating or combining that vector of pixels with a vector of the annotation of the image. It then "learns" to denoise with that conditioning information, the annotation.

Then, you can take the trained model, and do the same thing, with just a text prompt as a vector concated to a vector of random gaussian noise, and no input images.

That's basically and very simplistically how it works.

The output is not a substantial reproduction from the input images + annotations when trained. It takes the random noise, and "tries" to denoise it into something consistent with the prompt with conditioning to guide it.

Your attempt at covering would be a substantially similar reproduction. Your goal is to do a reproduction. Whereas, the model "learns" to generate images consistent with an annotation/prompt, by conditioning it with that "goal" on top of how it "learned" to denoise the images.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: