Comedian and author Sarah Silverman, as well as authors Christopher Golden and Richard Kadrey — are suing OpenAI and Meta each in a US District Court over dual claims of copyright infringement.
Comedian and author Sarah Silverman, as well as authors Christopher Golden and Richard Kadrey — are suing OpenAI and Meta each in a US District Court over dual claims of copyright infringement.
Even if they did train the model on the entire text of the book, that’s still not necessarily copyright violation. I would think not, since the resulting model doesn’t actually have a copy of the book embedded within it.
Do we know that it isn’t?
How do we “know” anything where the answers are just being made up as part of humanity’s collective cultural game of Calvinball?
Courts in various jurisdictions will make various rulings. Judges will interpret them in various ways. Legislators will chime in with new legislation and new treaties. Internet arguments will churn away with a whole range of assumptions about what is true or false that may or may not have anything to do with reality.
I present my opinion here. I feel it is well informed and I can back it up in various ways when challenged. But nobody “knows” anything because these aren’t laws of physics or math that we’re talking about here.
Or did you mean whether we know if a copy of the book is embedded in the model? That can be more objectively tested, at least.
But the server used to calculate the model would have a copy of it. If training an AI model is not fair use then the mere act of loading a book you don’t have a license for into the server would be copyright infringement. Like text book. It’s a unauthorized digital copy. It’s all very untested legal grounds and seems like lots of people want to be the first to test it. Not everyone has a great case but if the courts interpret things a certain way there’s gonna be lots of payouts so maybe best to get in line early?
Perhaps, but that’s a separate legal issue from the model itself. You might have committed a breach of copyright in the process of gathering the material that the AI was trained on but the model itself is not a copy of that material and so is not itself illegal to train or use. And perhaps not even that, since downloading a pirated book is not the illegal part (uploading it is).
As you say, there’s some untested legal waters here. But it seems likely to me that the best that Silverman will accomplish is some nibbling and quibbling around the edges.
If you can give some vague prompts to the model to obtain something that is close enough to a significant chunk of the work that, had it been written by a human, was susceptible of being considered plagiarism… then I’d say the same laws protecting from plagiarism should operate there.
It doesn’t matter whether it’s really stored there in some form or not (in fact, it’s probably ok for to store copyrighted material in a private server as long as it’s lawfully obtained), but whether the output that is being distributed to third parties is violating the license of the work or not.