• TWeaK@lemm.ee
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      3
      ·
      10 months ago

      That is exactly how fair use works. Look up the legislation and quote where it says I’m wrong.

        • TWeaK@lemm.ee
          link
          fedilink
          English
          arrow-up
          12
          arrow-down
          2
          ·
          10 months ago

          So where does that say I’m wrong?

          I said fair use covers news, education, research, criticism, or comment.

          for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research

          Then I said the next thing considered is whether it is commercial.

          In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include— (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes

          I didn’t cover everything in the law, I just covered the relevant points in a way that could be easily understood and related to the subject at hand.

          My point is that the copying AI does isn’t really research, but even if it were considered research it is absolutely commercial and thus should not have a fair use exemption.

          • General_Effort@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            10 months ago

            You need to read this carefully. It’s a statute. It means exactly what it says.

            purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research

            Such as means that these are examples. This is not a complete list.

            the factors to be considered shall include

            All of these factors must be considered. It does not mean that other factors cannot be considered. These are not categories.

            A commercial purpose does not rule out a finding of fair use (and vice versa). It must be considered and that is all.

            I don’t think that Meta’s use can be classed as commercial. Presumably, they do hope that the research budget will pay off eventually. But what must be considered is the particular copying in question. Llama 2’s license looks to me fairly non-commercial.


            Eventually, fair use derives from the constitution. Copyright is a limitation on the freedom of the press (and of speech). But it cannot completely do away with these freedoms. The examples given in the statue here could not be banned completely even if they were not mentioned.

            The US Constitution itself allows congress to create copyrights. Or more precisely, it empowers congress to promote the Progress of Science and useful Arts by creating copyrights. That’s another limitation.

            I’ve seen a number of far-right commenters admit that this money grab would harm AI development (a “useful Art”). I think mostly these commenters hold some far-right ideology à la Ayn Rand that values property over society, but some may just be selfish and believe that they would personally benefit. Either way, it’s straight up anti-constitutional.

            • wikibot@lemmy.worldB
              link
              fedilink
              English
              arrow-up
              2
              ·
              10 months ago

              Here’s the summary for the wikipedia article you mentioned in your comment:

              The Copyright Clause (also known as the Intellectual Property Clause, Copyright and Patent Clause, or the Progress Clause) describes an enumerated power listed in the United States Constitution (Article I, Section 8, Clause 8). The clause, which is the basis of copyright and patent laws in the United States, states that: [the United States Congress shall have power] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

              to opt out, pm me ‘optout’. article | about

            • TWeaK@lemm.ee
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              10 months ago

              Such as means that these are examples. This is not a complete list.

              AI developers have explicitly envoked the research exemption. That is why I focused on that. I disagree that what they do is “research” for the reasons I gave previously. Bringing up the fact there are other exemptions is beside the point - they aren’t claiming any other exemption!

              All of these factors must be considered. It does not mean that other factors cannot be considered. These are not categories.

              Sure, but I never said that commerciality was the only thing that should be considered. My claim here is simply that it is so overwhelmingly commercial in nature that it overrides anything else and thus they should not be awarded the privilege of an exemption.

              A commercial purpose does not rule out a finding of fair use (and vice versa).

              A commercial purpose might not rule out of a finding of fair use. That does not mean it cannot rule out such a finding. All factors must be considered, but any one factor can outweigh the others.

              I never said it was an exclusive category, I just brought it up as the most significant factor - one which is not reasonably overruled by any of the others in this circumstance. In fact, every one of those arguably fails. To give detail:

              1. The copying is done in a commercial nature. They sell AI services. It’s offered very cheap right now - even for free for limited personal use - but eventually that will change as their demand for profit grows.
              2. The nature of the copied work is varied and includes all kinds of work, commercial and non-commercial. The copying is pandemic.
              3. The whole work has been copied into the training database. Significant portions of the work can and have been reproduced by the finished product, in spite of the finished product allegedly not containing the original work in its database. Furthermore, even if a human genuinely believes they aren’t copying something they read before, that does not mean they are innocent of copyright infringement - it is the similarity of the two works that make the determining factor.
              4. AI work is already flooding the market and pushing out original creators. Childrens’ books is one area where this is happening extensively - not only does this make it harder for genuine authors to get a break in the market, but they’re effectively training children to think AI work is normal. It’s not hard to see us headed to a future where people think AI is “real” and original work is “fake”, simply by volume.

              I will admit, not all of those arguments are very strong (particularly 4.). However 1. is the strongest and I think overrides any argument the other way for any other.

              I don’t think that Meta’s use can be classed as commercial. Presumably, they do hope that the research budget will pay off eventually.

              Those two statements contradict one another. Of course they want it to be commercial eventually - or, rather, they want to eventually turn a profit. Hell, AI is already being used in a commercial manner: if you want to make significant or non-personal use of AI systems currently on the market, you have to pay for it.

              Eventually, fair use derives from the constitution.

              Setting aside the fact that AI extends far beyond the borders of the US and its constitution, fair use and copyright are derived from copyright law, which is written by Congress. The Constitution grants Congress the right to write such laws, but no one is “invoking the Constitution” when they enforce copyright or claim fair use. The Constitution gives permission, but the law forms the definition.

              AI is not simply a “useful Art”. It is a commercial venture that exploits original work without duly compensating the authors of said work. Congress has a greater duty to protect those original authors than it does a business that seeks to exploit their work. I say this as someone who has never really made much of anything original myself. I play a bit of music, but don’t compose and just do covers. I probably (lol limewire definitely) infringe on copyright - but I do so exclusively in a non-commercial manner.

              Blurting out “far-right” is borderline a personal insult - one which is laughably far from the mark when addressed towards me - and points to you clutching at straws to cling to a frivilous argument.


              I now feel the need to ask, why do you so passionately defend AI businesses here? Why do you support them?

              Are you that infatuated with the novelty of their product that you have let go of objectivity?


              I also have to emphasise again that I’m a little disgusted that you made this political. You’ve tried to build an argument that “it is a Constitutional right” to infringe copyright in order to have AI tools, and you’re implying that anyone who opposes that idea is some kind of far-right nutjob. I hadn’t even heard of Ayn Rand before you mentioned her, but have you actually read her work, or did you just watch the Atlas Shrugged movie and form your opinions from internet memes?

              I’d actually probably agree with you about AI - if it was non-commercial in nature and truly for the benefit of the people. As it is, I think you are blinded by the sheen of a new toy, without realising it’s coated in lead paint.

              • TWeaK@lemm.ee
                link
                fedilink
                English
                arrow-up
                1
                ·
                10 months ago

                A commercial purpose might not rule out of a finding of fair use.

                ARRRRG I spent so long reviewing this comment, over and over and over again, and still there were words wrong. I’m not editing it though, I want the comment to stay clean.

        • BreadstickNinja@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          10 months ago

          Critical to understanding whether this applies is to understand “use” in the first place. I would argue it’d even more important because it’s a threshold question in whether you even need to read 107.

          17 U.S. Code § 106 - Exclusive rights in copyrighted works Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following: (1)to reproduce the copyrighted work in copies or phonorecords; (2)to prepare derivative works based upon the copyrighted work; (3)to distribute copies or phonorecords of the copyrighted work to the public by sale or other transfer of ownership, or by rental, lease, or lending; (4)in the case of literary, musical, dramatic, and choreographic works, pantomimes, and motion pictures and other audiovisual works, to perform the copyrighted work publicly; (5)in the case of literary, musical, dramatic, and choreographic works, pantomimes, and pictorial, graphic, or sculptural works, including the individual images of a motion picture or other audiovisual work, to display the copyrighted work publicly; and (6)in the case of sound recordings, to perform the copyrighted work publicly by means of a digital audio transmission.

          Copyright protects just what it sounds like- the right to “copy” or reproduce a work along the examples given above. It is not clear that use in training AI falls into any of these categories. The question mainly relates to items 1 and 2.

          If you read through the court filings against OpenAI and Stability AI, much of the argument is based around trying to make a claim under case 1. If you put a model into an output loop you can get it to reproduce small sections of training data that include passages from copyrighted works, although of course nowhere near the full corpus can be retrieved because the model doesn’t contain any thing close to a full data set - the models are much too small and that’s also not how transformers architecture works. But in some cases, models can preserve and output brief sections of text or distorted images that appear highly similar to at least portions of training data. Even so, it’s not clear that this is protected under copyright law because they are small snippets that are not substitutes for the original work, and don’t affect the market for it.

          Case 2 would be relevant if an LLM were classified as a derivative work. But LLMs are also not derivative works in the conventional definition, which is things like translated or abridged versions, or different musical arrangements in the case of music.

          For these reasons, it is extremely unclear whether copyright protections are even invoked, becuase the nature of the use in model training does not clearly fall under any of the enumerated rights. This is not the first time this has happened, either - the DMCA of 1998 amended the Copyright Act of 1976 to add cases relating to online music distribution as the previous copyright definitions did not clearly address online filesharing.

          There are a lot of strong opinions about the ethics of training models and many people are firm believers that either it should or shouldn’t be allowed. But the legal question is much more hazy, because AI model training was not contemplated even in the DMCA. I’m watching these cases with interest because I don’t think the law is at all settled here. My personal view is that an act of congress would be necessary to establish whether use of copyrighted works in training data, even for purposes of developing a commercial product, should be one of the enumerated protections of copyright. Under current law, I’m not certain that it is.

          • TWeaK@lemm.ee
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            10 months ago

            (1)to reproduce the copyrighted work in copies or phonorecords

            The works are copied in their entirey and reproduced in the training database. AI businesses do not deny this is copying, but instead claim it is research and thus has a fair use exemption.

            I argue it is not research, but product development - and furthermore, unlike traditional R&D, it is not some prototype that is different and separate from the commercial product. The prototype is the commercial product.

            (2)to prepare derivative works based upon the copyrighted work

            AI can and has reproduced significant portions of copyrighted work, even in spite of the fact that the finished product allegedly does not include the work in its database (it just read the training database).

            Furthermore, even if a human genuinely and honestly believes they’re writing something original, that does not matter when they reproduce work that they have read before. What determines copyright infringement is the similarity of the two works.

            If you read through the court filings against OpenAI and Stability AI, much of the argument is based around trying to make a claim under case 1.

            The position that I take is that the arguments made against OpenAI and Stability AI in court are not complete. They’re not quite good enough. However, that doesn’t mean there isn’t a valid argument that is good enough. I just hope we don’t get a ruling in favour of AI businesses simply because the people challenging them didn’t employ the right ammunition.

            With regards to Case 2, I refer back to my comment about the similarity of the work. The argument isn’t that the LLM itself is an infringement of copyright, but that the LLM, as designed by the business, infringes copyright in the same way a human would.

            I definitely agree it is all extremely unclear. However, I maintain that the textual definition of the law absolutely still encompasses the feeling that peoples’ work is being ripped off for a commercial venture. Because it is so commercial, original authors are being harmed as they will not see any benefit from the commercial profits.


            I would also like to point you to my other comment, which I put a lot of time into and where I expanded on many other points (link to your instance’s version): https://lemmy.world/comment/6706240