1. Reddit sells its api for high and is about to go for an IPO, its economy bases entirely on the data made by the users/communities. It is the work of the public, get robbed by a small group of individuals. A living example of capitalism.

  2. Fediverse isn’t enough to secure the publicity and usage of public data. What if the host of Lemmy instance also releases the snapshots of all the posts and modlogs, everyday, in the form of bittorrent? Only by doing so, we are safe from the host erasing public knowledge and data brokers.

  • Ashy@lemmy.wtf
    link
    fedilink
    arrow-up
    7
    ·
    8 months ago

    Having a full backup availible over torrent or some other public source would just make it even easier for data brokers. Now they don’t even have to do the scraping anymore.

    • wargreymon2023@sopuli.xyzOP
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      8 months ago

      It is possible to train your own LLM, you can be a data broker, I mean the problem is on the capitalism over data.

      edit: i added “capitalism of” in the title

  • GolfNovemberUniform@lemmy.ml
    link
    fedilink
    arrow-up
    7
    arrow-down
    2
    ·
    8 months ago

    If you make something public, it can be accessed by ANYONE. It’s what “public” is. If you want your public stuff not to be used by data brokers, just don’t make it public

    • ramble81@lemm.ee
      link
      fedilink
      arrow-up
      4
      arrow-down
      1
      ·
      8 months ago

      I think this is the fundamental flaw people always overlook. They want their data public and want to be able to restrict how it’s used.

      You know what else does that? DRM. The thing a lot of people are massively opposed to. The goal behind it is to reach a wide audience but restrict how it can be used.

      • GolfNovemberUniform@lemmy.ml
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        8 months ago

        DRM is not the only option. If they want to restrict the usage, they can just write a custom license for their publications. And wait isn’t the problem with DRM is that it uses unique device IDs?

        • ramble81@lemm.ee
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          8 months ago

          And how well does that work in games? “You can’t cheat, please don’t, pinky promise?” It’s the same with LLMs. They see data, they parse it, licenses be damned. It’s as bad as those people trying to link to the license they released their text under or on Facebook with people posting “I don’t approve my text to be used… “.

          • GolfNovemberUniform@lemmy.ml
            link
            fedilink
            arrow-up
            3
            ·
            8 months ago

            Well if someone breaks the license, they can be lawsuited. But yea if you don’t want your data to ever be used for anything, public is not an option. It’s the same with irl speeches

    • wargreymon2023@sopuli.xyzOP
      link
      fedilink
      arrow-up
      1
      ·
      8 months ago

      I don’t mean data brokers using my data, I mean they(hosts included) close that data and sells it for high. The public data is made and input’d by the public.