AI and Copyright: Expanding Copyright Hurts Everyone—Here’s What to Do Instead.

Cat@ponder.cat · 1 month ago

AI and Copyright: Expanding Copyright Hurts Everyone—Here’s What to Do Instead.

artificialfish@programming.dev · 1 month ago

You could always just do reverse search on the open dataset to see if it’s an exact copy (or over a threshold).

You MIGHT even be able to do that while masking the data using hashing.

tal@lemmy.today · edit-2 1 month ago

You could always just do reverse search on the open dataset to see if it’s an exact copy (or over a threshold).

True, but “exact copy” almost certainly isn’t going to be what gets produced – and you can have a derivative work that isn’t an exact copy of the original, just generate something that looks a lot like part of the original. Like, you’d want to have a pretty good chance of finding a derivative work.

And that would mean that anyone who generates a model to would need to provide access their training corpus, which is gonna be huge – the models, which themselves are large, are a tiny fraction the size of the training set – and I’m sure that some people generating models aren’t gonna want to provide all of their training corpus.

artificialfish@programming.dev · 1 month ago

Minhash might be able to produce a similarity metric without needing exactness and without revealing the training data.