Large language models (LLMs) like GPT-4 can identify a person’s age, location, gender and income with up to 85 per cent accuracy simply by analysing their posts on social media.

But the AIs also picked up on subtler cues, like location-specific slang, and could estimate a salary range from a user’s profession and location.

Reference:

arXiv DOI: 10.48550/arXiv.2310.07298

  • @SatanicNotMessianic@lemmy.ml
    link
    fedilink
    English
    278 months ago

    Okay, I think I must absolutely be misreading this. They started with 1500 potential accounts, then picked 500 that, by hand, they could make guesses about based on people doing things like actually posting where they live or how much they make.

    And then they’re claiming their LLMs have 85% accuracy based on that subset of data? There has to be more than this. Were they 85% on the full 1500? How did they confirm that? Was it just on the 500? Then what’s the point?

    There was a study on Facebook that showed that they could predict with between 80-95% accuracy (or some crazy number like that) your gender, orientation, politics, and so on just based on your public likes. That was ten years ago at least. What is this even showing?

    • @cucumber_sandwich@lemmy.world
      link
      fedilink
      English
      68 months ago

      There was a study on Facebook that showed that they could predict with between 80-95% accuracy (or some crazy number like that) your gender, orientation, politics, and so on just based on your public likes. That was ten years ago at least. What is this even showing?

      Advocates diabolo: that a large language model can do it without extra training, I guess. The Facebook study presented a statistical model on “like space” while this study relies on text alone, a much less structured type of input.

      I’m not saying it’s a good study. Just pointing out some differences.

    • P03 Locke
      link
      fedilink
      English
      28 months ago

      SnoopSnoo was able to pick out phrases from Reddit posters based on declarative statements they made in their posts, and that site has been down for years.