Teenagers with Limewire: pirates. OpenAI with terabytes: pioneers. Guess file si...

edmundsauto · on Feb 23, 2024

This is a shallow take. Teenagers are copying bits for their own gratification. OpenAI built a fascinating tool that enables other people to create things by transforming bits.

Put another way, one group enables people to make things. The other does not.

endisneigh · on Feb 23, 2024

How do you know teenagers were not using the content to create their own music?

Funny how BigCorp gets benefit of the doubt. By the way, despite the name, OpenAI and teens are both doing it for their own gratification.

edmundsauto · on Feb 23, 2024

Some people were making really cool remixes! It’s not all or nothing, but if you can’t see the difference between making a tool for others to use, and copyright violations, I don’t see value in continuing the discussion.

johnnyanmac · on Feb 24, 2024

>if you can’t see the difference between making a tool for others to use, and copyright violations, I don’t see value in continuing the discussion.

If a tool I used is suspected of containing copyright violations, it'd get sued. This happens even in software as Google v. Oracle has shown (among dozens of other cases. Maybe hundreds by now).

And lo and behold, OpenAI is getting sued on suspicion of copyright violation. "tools for other to use" isn't a defense against copyright violation, and never has been.

jprete · on Feb 23, 2024

OpenAI built it for a combination of their own gratification and cold hard cash. What do you think motivates AI researchers? What do you think motivates OpenAI employees now? Sam Altman? Microsoft?

edmundsauto · on Feb 23, 2024

I try not to guess what motivates complex animals such as humans when in an abstract discussion like “what motivates Sam Altman”. Do you have any inside knowledge of what motivates him or are you guessing because you have correlated things you don’t like with a company or individual?

ricc · on Feb 23, 2024

I try not to guess what motivates complex animals such as humans when in an abstract discussion like “what motivates teenagers”. Do you have any inside knowledge of what motivates them or are you guessing because you have correlated things you don’t like with a certain demographic?

edmundsauto · on Feb 23, 2024

Not trying to be inflammatory, but it's not really about teenagers, or about OpenAI's intentions. We can look at what they are doing.

One group is downloading things other people made, sometimes transforming them - but we certainly haven't seen an explosion of remixes at the scale of OpenAI creations. The other group, OpenAI, makes tools that ingest copyrighted material and enable people to make a huge number of more complex transformations than the original "remix" culture, where the inputs are usually quite visible.

FWIW, I don't even really think the content pirates have as terrible a name as in GP's comment. I certainly have no criticism of them, especially considering that's how I got my start in technology. It's fine, it's just not as cool or as widespread as GenAI.

johnnyanmac · on Feb 24, 2024

>but we certainly haven't seen an explosion of remixes at the scale of OpenAI creations.

yeah we have. It's just that there was no Twitter/Facebook/Instagram/Tiktok/Vine/Youtube/Reddit and 20 other sites with more people on them today than there were on the internet 20 years prior. But if you browsed the Livejournals and other relics of the early 00's these aren't hard to find at a proportional mass. This drove a lot of MySpace to the point where the post-Tom era chose to try and pivot into the music service angle over a Facebook competitor. And a lot of that was possible thanks to being able to easily access rips of CD's.

Ironically enough, the main thing holding back music from being as profitable as photos was the music industry itself. They were so aggressive in shaping copyright and hoarding everything into Vevo that they lost billions as new media shaped itself. Squandering talent instead of grabbing that talent for themselves to profit from, trying to remain the trend setter instead of expand or leaning into emergent genres, surrendering that waning control to a subscription service (which consistently remains unprofitable) instead of themselves establishing a platform to profit from (from in-house talent and indies alike). So many missteps and it ends in artists no longer being able to make money from the music themselves.

>The other group, OpenAI, makes tools that ingest copyrighted material and enable people to make a huge number of more complex transformations than the original "remix" culture, where the inputs are usually quite visible.

under what metric? It's weird to talk about "remix culture" and argue that AI can transform it futher... at which point it's no longer a remix and arguably an original song. Which people already do.

Some artists are fine focusing on remixing, but remixing for others is a step towards building the talent to make their own music, and hopefully the remixes establish a brand others want to follow.

>It's fine, it's just not as cool or as widespread as GenAI.

I don't think any people starting their careers in tech in the 90's-early '10's would be here if "cool" was a preliminary for their happiness.

jncfhnb · on Feb 23, 2024

Open AI is building things for their own gratification. Open companies are building things for others to create things.

johnnyanmac · on Feb 24, 2024

>Open companies

sure do wish we had those.

jncfhnb · on Feb 24, 2024

Stability is more than good enough

bestcoder69 · on Feb 23, 2024

limewire teens were pioneers too

notyourwork · on Feb 23, 2024

What does piracy have to do with generative AI? I don’t understand your analogy.

endisneigh · on Feb 23, 2024

Useful generative AI is only possible by the same collection of copyrighted material previously deemed illegal. I imagine a model trained solely on material explicitly marked for AI use would be significantly worse.

ghshephard · on Feb 23, 2024

On the flip side - almost (say, 99.99%) of Human Engineers, Artists, Technicians, Scientists, etc... have mental models trained on copyrighted material.

Nobody (to my knowledge) has ever said that you can't train on copyrighted material - what you aren't allowed to do is copy or directly plagiarize. Something that all the generative systems are going to great pains to remove from their system where possible.

Are they doing a perfect job - nope. But they'll get better, and this is good - copyrights are supposed to prevent replication, not use of their material.

jprete · on Feb 23, 2024

Lots of people have said that it's illegal to train on copyrighted material without a license to it.

Also, performance licenses for movies, plays, recorded music, and copyrighted scores are all required. The lack of copying is not relevant there, the performance alone can be infringing.

ghshephard · on Feb 23, 2024

I'm intrigued - can you point me to any credible commentator or article that makes the argument "it's illegal to train on copyrighted material without a license to it" - that seems entirely contrary to the spirt (as I understand it) to copyright law, which is it grants you a right to copy.

I like that you bring up music - almost every musician you have ever listened to (some exceptions of course) developed their talent by learning from others - chords, bridges, etc... And I'm just as certain that close to 0% of them had a "license to learn" from their material.

jprete · on Feb 24, 2024

I'm talking about ML training. I think human training is expressly covered by fair use (i.e. copying for educational purposes). Sorry for the confusion, I misread your comment.

johnnyanmac · on Feb 24, 2024

>Are they doing a perfect job - nope. But they'll get better

I lost all optimism with tech hoping "they'll get better" quite a while ago. No, it's time to regulate them before they burn the bridge this time.

teachinghacker · on Feb 23, 2024

It uses a lot of copyrighted content without permission.

jonathankoren · on Feb 23, 2024

What permission would be needed? It's READING it.

Are there problems when it reproduces it verbatim? Absolutely, but that's not what the copyright maximalists are talking about. They're saying it's a violation of their right of reproduction when someone simply reads their work.

I remember when people were upset that someone would (gasp!) link to their site without permission. Especially if it was a "deep link".

Expanding copyright in this novel way obviously is going to lead to a whole slew of problems, least among which is going to be ensuring regulatory capture by the largest of the largest simply because only they will have the money to license reading.

It's a shakedown by industries that no longer have a viable business model.

johnnyanmac · on Feb 24, 2024

>What permission would be needed? It's READING it.

and then storing it in a database, as shown by the ability to nearly replicate images with enough prompting before they band-aided a fix over it (which does not remove it from their database). It's clearly not just "reading". You can argue the same for a human mind, but it's a lot easier to peer into a mind of code for now (and honestly, by the time we can accurately read brainwaves LLM's won't even be in the top 10 of ethical concerns anyway).

All that aside, web scraping has been legally contentious for over a decade. This mass scraping for commercial LLM usage is honestly making a horrible argument for that already dubious factor.

>Expanding copyright in this novel way obviously is going to lead to a whole slew of problems, least among which is going to be ensuring regulatory capture by the largest of the largest simply because only they will have the money to license reading.

It's probably for the best, since at that point at least the owners of the data are getting paid (though there's other grey areas to iron out. Especially with User-generated content being sold as if the site "owns it", while being legally exempt from being sued for hosting it). the opposite effect just means the corporations win indirectly instead of directly, with less money flowing around. A company that can outspend the competition can also out spend on hardware to process faster, scrape more, and polish he final effects. There's no endgame here where the corporation loses and the indies win, short of some absolutely radical policy changes.

pengaru · on Feb 23, 2024

Generative AI is computational plagiarism of diffuse - but still copyrighted, sources.

brigadier132 · on Feb 23, 2024

> computational plagiarism

Plagiarism is copying without attribution. Transformative works do not qualify as plagiarism.

dylan604 · on Feb 23, 2024

when does it credit/attribute who/what/when/where the data it copied from?

brigadier132 · on Feb 23, 2024

You don't need to credit for a transformative work. LLMs don't regurgitate like you think they do.

pengaru · on Feb 23, 2024

> LLMs don't regurgitate like you think they do.

"Copilot has been found to regurgitate long sections of licensed code without providing credit — prompting this lawsuit that accuses the companies of violating copyright law on a massive scale."

https://www.theverge.com/2022/11/8/23446821/microsoft-openai...

brigadier132 · on Feb 23, 2024

Like I said, they don't regurgitate the way you think they do. Doesn't mean it can't regurgitate data if it's overfit to the training data.

johnnyanmac · on Feb 24, 2024

>they don't regurgitate the way you think they do.

Copyright is about the outcome, not the destination. Short of edge cases where two contemporary inventors in different parts of the world funnel upon the same novel idea at the same-ish time, the method doesn't change how we interpret infringement.

So the fact that it's capable of doing it is enough to bring about legal suspicion.