Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Copyright was never conceived to apply to technology like this and the onslaught of copyright suits (like the NYT one) underscore its fundamental rent-seeking nature. No doubt these latest changes to GPT-4 are in response to the suits they’re presently fighting. However these cases are ultimately resolved, the end-user will be the biggest loser.


People generate all of the data going into the system and then the middle-men (OpenAI, Microsoft, Google, Big Tech middle-man of the week) reap a disproportinate centralized benefit. That causes a bigger problem than the so-called rent-seeking behavior of copyright holders in this case, as this has the net effect of leveraging human creativity, etc. to devalue it and continue the erosion of the middle class.

Bad things happen when you let middlemen get the upper hand, like the American health care system, or big finance disconnected from the real economy. I'll vote against the middleman every time in favor of the original value creator, because society goes down the toliet when middlemen win.


What is the alternative though? I agree with the feelings and sentiments of the anti-ai people that want it to pay copyright, but I never hear any considerations for what comes next.

This is going to end up being the music industry all over again. It's going to be impossible for any individuals or small companies to get the rights needed, and instead were going to get massive content labels selling the rights, or only giant corporations being able to hop through all these new hoops.

We don't want a repeat of that as a society, creating yet another leeching middleman and horrible industry favoring only the incumbents.


I don’t see it ending like that. LLMs will just be taught not to emit copyrighted content verbatim. Whatever the courts end up deciding, they’ll be trained to stay just this side of legal. I’m certain it’s already being worked on.


Yes, when I read "rent-seeking" i assume OP meant OpenAI.

Google search at least was just a link to content we wrote. OpenAI just steals it.


OP was obviously referring to the copyright holders whose data he feels so entitled to.


Open models are a thing. Rather than attacking the technology (which is great) with litigation to hurt a few bad actors, we should attack the capitalist rules that enables rent seeking middle man parasites to flourish.


Yes. If you think about it, the individual is being subjected to a man in the middle attack, cleaving a creator from their creation via the use of consent agreements for providing a platform. Rent seeking.


The artist or author might end up being the loser, and the multi billion corporation harvesting their work might make an unearned profit off it.

To me personally it's crazy how many people think that we would be better off without any kind of copyright protection. Copyright solves many real world problems and protects people against having a company profit off their work... but as soon as AI is involved so many people start to advocate for throwing it away.


If companies are required to purchase licenses for everything they train on, it will guarantee that only huge corporations with deep pockets can produce powerful models. Microsoft will be slightly inconvenienced, Stability AI will be destroyed. Some artists might get a payday, but most of the money will go to companies with large copyright libraries like Getty. The general quality of all models will decrease. I don't see any other possible outcome.


Almost a year ago, I made¹ the following prediction:

It looks like to me that many companies want to use the new generative tools, and many others want it not to impact their stake in the copyright system. I’m pretty sure they will both come to a compromise which will leave most users without any benefits, either from reduced copyrights or from availability of generative tools. It’s what would make both powerful parties satisfied (if not happy), and will impact the status quo the least.

Say, for instance, that they instituted a mostly mandatory licensing scheme, so that an individual artist had no choice but to allow use of their art as input when creating generative tools. People using art in this way have to pay a rather high licensing fee, but it is not paid to the artist, but to some sort of central copyright office. Huge copyright holders can also pay an exorbitantly high fee (to the same recipient) to opt out of licensing. Win-win-win; Existing copyright holders keep their existing copyrights, only large-ish actors can create new generative tools, new political positions and institutions are created with lots of money flowing in. Of course, artists then get screwed by being co-opted by generative tools which they can never afford to create themselves, and the general public get robbed both of the opportunity of using and creating new generative tools, and of any less restrictive copyright law.

1. <https://news.ycombinator.com/item?id=35191112>


For music there are already similar mechanisms in place in many countries - in Poland it's ZAiKS, in US it's ASCAP. They collect fees from organisations playing copyrighted music publicly.

(I agree that it would be terrible if they began enforcing other copyrighted content and for training purposes, because it would lead to centralisation)


Sacem in France.

They're the worst, eg they will notoriously come after you if you play public domain music as well.


I hope you’re wrong, but I think you’re right.


In agreement with your "slightly inconvenienced": The world's dozen or so largest publishers have market caps averaging below $10bn range each.

"Even" just OpenAI alone could pocket a few of them if they need easy sources of acquiring content.

This includes the largest educational publishers. And while these publishers do not own all their content, the reality is most authors earn so little, that a "allow AI training on my work for $x extra" would give them vast amounts of content.

As for Getty, Getty has a market cap of "only" $2bn. The big players will easily afford to build or buy libraries like that.

But of course it will be the end of decent open models.


> it will guarantee that only huge corporations with deep pockets can produce powerful models

It will also guarantee that the financial means to continue making that data, that is clearly so important, would be preserved. Someone has to pay for the crafting of the data.


For many artists this is not about "getting a payday" and is instead about "not being replaced by AI". So the outcome you describe would probably sound great to those artists.


How did dock workers feel wen containerized shipping starting gaining popularity? Should we have let them all continue putting things on ships piece b piece and stacking and unstacking each shipment by hand?

How did portrait artists feel when photography was gaining popularity? Should we have let them control the industry so that if we want to record a memory of a person we must have them stand or sit for hours while someone draws them?

etc.


Man there's always someone in these discussions who will smugly tell us that this is all inevitable and our empathy for the creatives in our economy is misplaced. To you I give a hearty fuck you.


No, I am describing what happens when technology makes the market for certain jobs and talents change. The stevedores may have had a bad time for a while but our modern society only exists because we can ship things quickly and efficiently.

I feel bad for copy editors and people who write corporate blog posts or design logos or come up with ad jingles, but their niche is gone now and they need to adapt.

Thanks for being respectful and cordial though.


I often see these processes described as passive economic mechanisms that we are subjected to and not as decisions that we all make collectively and actively accept, making excuses based on the neoliberal understanding of our time as to why those people deserve to have their jobs made redundant and their livings wrenched from them.

To me, it's a kind of cowardice that people like you shrug your shoulders at and sigh and say "that's just the way things are". You can say that's just how the markets work. I don't have to respect you for it.


I am not saying that artists are going to stop being a thing. We will keep buying books written by people and watch movies directed by people and people will still make music and what have you, but it will be different. The music industry was completely different in 1900 when there no available mass recordings, different again in the 1950s with popular radio and records, and the 2000s brought the internet and MP3s.

Things change -- people's jobs will be different. It isn't going to mean artists will stop making art or machines will make everything bland, it is just a new tool that will change industries and make things easier for people to do well and thus make more art. Some people won't be able to live well doing the same thing they do now, but what they do now wasn't what they would have been doing if they were in their grandparents time.


I'd say you are creating a bit of a straw man there. The commenter you are responding to didn't say that's just the way things are. It feels like you are making their argument for them.

They showed some examples in the past and showed that society adapted.

We could try and improve our society and systems to have a safety net, education that allows us to adapt to rapidly changing technologies, etc sure but that's a whole discussion in itself.


If you give people freedom (good thing, right?) and tools exist to perform a task in a variety of different ways (some faster/more efficient than others) people will naturally gravitate towards using the most efficient tools to gain a competitive advantage, and other people will prefer work produced with those tools because it's better/cheaper. As long as better tools exist and people are free, this is just the way things are gonna play out.

If you're angry that independent artists are being fucked over by bigcorp, AI tools aren't the battle you should pick, because it's a guaranteed loss for a lot of very logical reasons, and it's just another example of a pattern of oppression enabled by our social and political systems. Even if by magic you managed to change something there'd just be another inequality coming down the pipe shortly after.


The good artists are already using AI, just like they photobashed, traced templates and used camera obscuras to produce better art faster down through the ages. A true artist transcends medium to focus on message.


AI is a tool. different artists use different tools. some good artists use ai. many good artists will not be interested in that particular tool.


I don't think most people believe we are better off without copyright. I think people believe that copyright protects specific concrete expressions and that fair use exists to allow others to build on ideas in transformative ways. It's not clear where building a learning model from this work sits in this context, hence the court cases.

Also, it's a subtle difference, but copyright is not intended to solve the problem of companies profiting off of artist's works, it is intended to promote the progress of science and useful arts. It attempts to do this by giving creators limited exclusive rights.


How does locking away most of the knowledge, research and learning materials in the private vaults of a few publishing houses for their personal profit promote the progress of science I wonder?

Even scientists are tired of the predatory and rent seeking behaviour of the publishers they have fallen prey to and are looking for any way out.

This is not promoting progress this is the opposite of it


I think it grossly mischaracterizes what copyright protects to describe is as "most of the knowledge, research and learning materials". Still I agree, that the extensions of copyright length and the behavior/incentives of publishers works against the original intent of copyright. Having said that, publishers only have control of copyright because authors give it to them. Copyright rests with the creator — the system where people are compelled to sign this over to publishers is a different (but of course related) problem. Scientists who are tired of the predatory behavior of publishers have other choices today. It's not clear what alternative you are proposing.


> vaults of a few publishing houses for their personal profit

Because they made it, it wouldn't exist without them, and others value it. If this data wasn't objectively valuable, we wouldn't be having this discussion.


> but as soon as AI is involved so many people start to advocate for throwing it away

No, it's been years I've heard it.

Don't try to portray some people opinion as they are some AI zealot.

It's brought up on discussions about torrent, Disney, streaming platforms, music, etc...


Yes. I've been aware of the intellectual property debate at least back to the great crackdown on sampling around when Paul's Boutique was released. And following it in depth from around the time Lawrence Lessig made arguments to the Supreme Court.

A large chunk of the tech community was following that case and most on HN seemed to be highly sceptical of the current status quo.


How does it protect a small artist against a large corporation profiting off their work?

I don’t even have the means to start litigation, let alone see it through.

It only protects those who are already moneyed and/or famous enough to negatively impact a large corporation’s reputation - and even in those cases it’s mostly for the benefit of the lawyers and bureaucrats who make a living off it.


If you register your work, which requires some effort, but is not prohibitively expensive or difficult, you can sue for statutory damages, which are substantial enough (up to $150k for willful infringement) that lawyers will work on contingency. There are many individual artists how have been successful here. The law actually has some real teeth that individuals can use to protect their work.


It would be nice if there was a preventative concept, where the role of the creator being a predator, seeking and suing, would be mostly reversed, so that others would instead ask for permission, and maybe get the rights to copies through a fair exchange of money, like a license. We could call this "copy rights".


> The artist or author might end up being the loser, and the multi billion corporation harvesting their work might make an unearned profit off it.

Exactly like before AI you mean then? Except instead of OpenAI it was Disney, Universal and other large corporations on that same seat.

>to me personally it's crazy how many people think that we would be better off without any kind of copyright protection.

Why should I care that the old billionaire copyright corps are dying exactly? What would I benefit defending them for me as what they did was privatizing culture as far as I remember for their personal benefit and even had a large negative influence on tech.

The copyright system being so unequal and skewed towards multi billion companies dug its own grave by itself.


The copyright issue seems unchanged. Anyone taking wholesale quotes from another entity is likely in violation of copyright law. If someone uses AI, and posts the output from it as their own work, and that work contains copyrighted material, the person who posted it is in violation of copyright. AI is just a tool they chose to use and they remain responsible for remaining in compliance with copyright law.

What we need is a reasonable way for people using AI to determine which parts of the text or images they have are subject to copyright.


Just a tool that required billions of dollars worth of copyrighted material to be created.


How can you possibly argue that taking a bunch of text and creating an application that creates text isn't transformative?

The tool itself unambiguously is fair use.


Whether something is transformative is one of 4 tests for fair use.


> Anyone taking wholesale quotes from another entity is likely in violation of copyright law

What do you mean anyone?

Is Sony liable when you play an entire movie on their TV? Is Nuance liable when you use their Dragon screen reader to cerbalize an entire NYT article? Is Google liable when you display an entire webpage in Google Chrome? How about if you switch to Dark Mode, is that a transformative use?

Why would AI be any different? It’s just a tool at the end of the day!


The problem is people at large companies creating these AI models, wanting the freedom to copy artists’ works when using it, but these large companies also want to keep copyright protection intact, for their regular business activities. They want to eat the cake and have it too. And they are arguing for essentially eliminating copyright for their specific purpose and convenience, when copyright has virtually never been loosened for the public’s convenience, even when the exceptions the public asks for are often minor and laudable. If these companies were to argue that copyright should be eliminated because of this new technology, I might not object. But now that they come and ask… no, they pretend to already have, a copyright exception for their specific use, I will happily turn around and use their own copyright maximalist arguments against them.

(Copied from a comment of mine written over a year ago: <https://news.ycombinator.com/item?id=33582047>)


> these large companies also want to keep copyright protection intact, for their regular business activities

Care to share an example? I didn't hear of OpenAI or anyone else arguing or trying to sue anyone for abusing the copyright. If anything, their business decisions rely on an assumption that copyright will not help them protect their work


Prime example for you right here:

https://nypost.com/2023/12/18/business/openai-suspends-byted...

100% pure unadulterated hypocrisy from "OpenAI".


T&C yes, but not copyright. This is fully consistent with them opposing copyright and not opposing paywalls/api limitations.


Don't they have an explicit T&C that says you are not allowed to use their output for training other models?


T&C yes, but not copyright.


I was mostly thinking of large companies also creating their own AI, like Google, Microsoft, etc.


If their model was leaked, you can be sure they’d claim copyright protection on it.


I wanted to say that they are to smart to expect dmca to protect them.

But then, I think that surely they would use copyright to block competition from using their model directly.


Because ChatGPT users are the only people that are worth considering.


OpenAI is the force to cut slice from the copyright pie which the big copyright hoarders have. The hoarders will not strike back to try to kill the OpenAI business. Because in any case they will not be able to kill the technology itself. So, obviously, it's better for them to have OpenAI as a partner and share some profit with them to control the AI field than to kill this one and wait for another AI menace to raise.

OpenAI is not the one who would kill a copyright. They just want their cut.


Why should 'big tech' corporations be allowed to use AI to remix/mash-up human-generated content all of a sudden when creative individuals have generally been prohibited from doing it for so long?


Wow, I didn't know creative individuals have been banned from remixing copywrited material in their own private works.

We must tell the millions of kids who doodle characters in their notebooks that this prohibited.


The data is not the technology.


[flagged]


Physical property is a consequence of the laws of physics. If I have a gold coin in my hand, then you don't have that gold coin in your hand. If you want the coin then you can either trade for it or fight me, but either way only one of us can have the coin. Even a bird with a worm knows a concept of physical property.

Copyright is something that was invented relatively recently, a few hundred years ago, because of a new technology back then: the printing press. Before the printing press there was no need for copyright.

Now today we again have a new technology in neural networks, and it's entirely possible that the realities of this technology push us back in the other direction, undoing what the printing press did.


The world you describe is that which turned former communist countries in the underdeveloped entities they are today. By not respecting people’s right to own property you disincentivise them from adding value. The printing press analogy is relevant. Similar to how we made rules on how that tool can be used we now need rules in how to protect people’s creations from being taken away by force using ai. Physical property rules are not the result of “physics”. Are the result of evolving beyond the savagery of taking that which doesnt belong to you by force. If it wasnt protected by law it would be protected by the sword. I get that some people would prefer that type of living but by and large civilised humans dont.


> Physical property is a consequence of the laws of physics. If I have a gold coin in my hand, then you don't have that gold coin in your hand.

So do you believe people should only be able to own real estate that they are currently physically occupying? By your reasoning, no one could ever own a piece of land larger than what they were currently standing or lying on. So no one could own land. So abolition of all property rights, essentially.

I'm hella down for this, but then we should also be able to walk into the OpenAI offices and inspect their source code, cause that shit won't fit in any one hand afaik.

This is incredibly naive misunderstanding of how property rights work: they are 100% a social and conceptual construct. IANAL, but I believe you are confusing property with possession.

But, yeah. I'm down: no one can own anything that isn't currently in their hand. Let's go liberate a lot of fake property that is "owned" in violation of the laws of physics!


and more than that: even if we would agree that works should not be protected, we're currently in a highly-asymmetrical position where big players like Microsoft can take people's hard work but give nothing back. The only way to survive under a copyright regime is viral licensing.


You probably didn't mean to, but implying that's how the whole of humanity works is a bit out there.

It's the local rules (geographically and temporally), sure. But rules can be changed.


Sure, for reasonable copyright terms. Currently, if you create something when you’re young and live long, a 150 year long copyright term is reasonably possible. (Life + 70 years)

Much as I appreciate someone’s rights to their work, things should enter the public domain in something more like 10 to 20 years. Even then, copyright protections are too strong when in force. You published something so people would use it, your ability to limit how should be quite restricted to protecting you from folks selling it as their own. I am also in favor of forced standard licensing terms.

Like say after five years there should be a standard streaming licensing fee for films and shows such that anyone can broadcast/stream/sell copies for a flat rate.


We also have to consider the cost of enforcement. We can’t be soaking up millions and billions of taxpayer dollars to protect copyrights or field complaints that aim to protect mutated copies of said works…just like you don’t send a swat team to enforce parking tickets, we have to consider what is at loss for the New York Times or other copyright holders before clogging up the courts.

There’s a reason lawyers are so quick to file a suit and it’s because it cost nothing to sic the dogs of the American justice system on others.


Kim Dotcom’s adventure calmed things down for the last wave of digital ip theft. Once that happens with one or two ai copyright disbelievers the rest will calm down.


But is it really a problem if the AI is transmitting the information in its own words? And even if that is considered illegal, doesn’t it significantly diminish said crime?


AI doesnt transmit information in its own words. It has no “own” no “self”. It does what it was programmed to do, just like any other type of software. Turns out that some people using ai have made it ingest content without permission so they can resell it for profit. That should not be permitted. My property is not yours to take unless you agree to my terms. I did not give you permission to download my data, art, code or text, to ingest in a token database and then resell it in any shape or form derived or not. No ifs no buts. If you want it you have to pay for it or respect terms. The bulk of ai companies respect that. A handful of sociopaths dont. They are the issue.


I wouldn't call ANYONE disrespecting terms and conditions they may have agreed to a sociopath. Not everything in a contract is enforceable just because it's written there, whether or not both parties signed it. And unless it's spewing out copyrighted materials "verbatim" there is an argument to be made that the LLM learned to talk from an open source and inserted knowledge from a copyrighted one.

However this turns out for private AI, I hope at the very least it can be considered fair use. Monetized LLMs can be forced to pay up or follow terms but individuals should be able to pool together and create open source models. I'm not saying I have the exact legal arguments for why this would work but LLMs in their current forms need to exist.


I absolutely agree that LLMs should exist. Torrents still exist and have their purpose. Criminals always argued that their crime is not really a crime and found all kinds of arguments in favour of it. Similarity people developing ai that doesnt respect people’s property use all sorts of wild arguments in their favour - ai learns like a human, it benefits society, other countries will use it against us, and so on. That doesnt mean we should give into their demands to destroy society and people’s lives so they can have a competitive advantage over honest people. The fact that they want to steal, destroy entire industries they take from, and demolish norms so they can make their software appear intelligent, makes them sociopaths.


A significant portion of the training set for most image generation tools is stuff made in the last 10-20 years harvested from the internet, if not the last 5 years. We're not talking about 150 years of copyright protection here, we're talking about the time frames you suggest. Artists want to protect their own work and their livelihood, and AI is being trained on the work they're actively putting out right now. You would have to shorten copyright duration to something like 5 years to come remotely close to making modern image generation models possible without violating artist copyrights.

Text is different and much less difficult since its history as a medium is much longer - if you excluded the last 10-20 years of prose from your LLM it would probably still be very good at writing. But excluding the last 20 years of digital illustration and photography would be limiting yourself to a much lower-fidelity training set.


Your work is not free from derivation which is what GPT4 does in the overwhelming number of cases. If there are small outliers and it regurgitates something word for word, we can handle it like most other instances of copyright infringement as we do now. File a takedown notice and that particular phrase can be explicitly filtered out post output generation. Easy.


I agree about the derivation bit, but “File a takedown notice for every NYT article ever published after proving GPT can reproduce each one” is not what I would call a clean solution. That’s basically a regulatory DDOS attack.

Current copyright law is simply not equipped to handle LLMs, I think.


It’s what they do anyways. The file suit after takedown after DMCA and never ever hesitate to drag court cases out over months and years, wasting everybody’s time to make sure grandma pays up because someone in her house was using Napster.


I, for one, will enjoy watching lawyers and AI fight to the death.


I already love AI too much to enjoy it.


No, no one gets to get away with breaking the law over and over and over again with a simple "whoopsies" each time they get caught. There needs to be penalties.


So YouTube should be shut down the first time a copyrighted work is uploaded to it?


Yeah, except I paid for the work i derived mine from. I paid taxes to learn in school, i paid for textbooks, i paid to see a painting, i paid to watch a movie, and i paid even to learn how to speak and do math. Stop stealing, and pay what you owe. Easy.


Are you really suggesting that learning from watching others, going to library, taking in the public domain, etc. is a form a theft?


>Corporate communism want to take that away

contentless, thrashing drivel


I'm sure you'd feel the same way if it was your life's work these systems were hoovering up and regurgitating.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: