- cross-posted to:
- modcoord@lemmit.online
- cross-posted to:
- modcoord@lemmit.online
The bad, although expected news is that according to Similarweb via Gizmodo Reddit traffic is back to pre-protest levels. The caveat is that some of the traffic might still indicate protests, (i.e. John Oliver pics). Most interesting:
However, Similarweb told Gizmodo traffic to the ads.reddit.com portal, where advertisers can buy ads and measure their impact, has dipped. Before the first blackout began, the ads site averaged about 14,900 visits per day. Beginning on June 13, though, the ads site averaged about 11,800 visits per day, a 20% decrease.
For June 20 and 21, the most recent days for which Similarweb has estimates, the ads site got in the range of 7,500 to 9,000 visits, Carr explained, meaning that ad-buying traffic has continued to drop.>>>
One of the reasons they are doing this is because of the large language models being implemented. These companies are using Reddit to train the models. The reason is because of the voting on replies. Where else can you get millions of questions being answered with actual humans saying how good a response is?
The big boys in the current AI space will definitely pay for the API. They’ll likely pay a lot for it as well.
Why pay the bloated and gouging costs for API access when you can just write a web parser and scrape the site the old fashioned way?
Scrapers can easily be disabled. Reddit won’t look the same obviously. But this isn’t a real obstacle.
then the scrapers start using residential proxy botnets
Then you just force them to change the syntax repeatedly and scraping will break with regular occurrence. Scraping is extremely fragile and not easily adaptable without human effort which costs money.
There is no reason other apps need to be swept up in the same cost structure as LLM enterprises.
Exactly, the LLM excuse is just that, an excuse to purge 3rd party apps and push ads/get user data that is otherwise unavailable to them.
They may not need to. Already trained, already got the data that they need. Going forward they can just continue training with the input from
The users (all the folks talking to ChatGPT directly for example).