this post was submitted on 15 Mar 2024

295 points (100.0% liked)

Technology

37699 readers

298 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

alyaza@beehaw.org

TheRtRevKaiser@beehaw.org

gyrfalcon@beehaw.org

rs5th@beehaw.org

Los@beehaw.org

coldredlight@beehaw.org

SemioticStandard@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

295

Reddit’s Sale of User Data for AI Training Draws FTC Investigation (www.wired.com)

submitted 7 months ago by hedge@beehaw.org to c/technology@beehaw.org

23 comments fedilink hide all child comments

The platform says it stands to make more than $200 million in coming years from Google and other companies that want user comments to feed AI projects. Regulators have questions.

Reddit said ahead of its IPO next week that licensing user posts to Google and others for AI projects could bring in $203 million of revenue over the next few years. The community-driven platform was forced to disclose Friday that US regulators already have questions about that new line of business.

In a regulatory filing, Reddit said that it received a letter from the US Federal Trade Commision on Thursday asking about “our sale, licensing, or sharing of user-generated content with third parties to train AI models.”

The FTC, the US government’s primary antitrust regulator, has the power to sanction companies found to engage in unfair or deceptive trade practices. The idea of licensing user-generated content for AI projects has drawn questions from lawmakers and rights groups about privacy risks, fairness, and copyright.

Reddit isn’t alone in trying to make a buck off licensing data, including that generated by users, for AI. Programming Q&A site Stack Overflow has signed a deal with Google, the Associated Press has signed one with OpenAI, and Tumblr owner Automattic has said it is working “with select AI companies” but will allow users to opt-out of their data being passed along. None of the licensors immediately responded to requests for comment. Reddit also isn’t the only company receiving an FTC letter about data licensing, Axios reported on Friday, citing an unnamed former agency official.

It’s unclear whether the letter to Reddit is directly related to review into any other companies.

Reddit said in Friday’s disclosure that it does not believe that it engaged in any unfair or deceptive practices but warned that dealing with any government inquiry can be costly and time-consuming. “The letter indicated that the FTC staff was interested in meeting with us to learn more about our plans and that the FTC intended to request information and documents from us as its inquiry continues,” the filing says. Reddit said the FTC letter described the scrutiny as related to “a non-public inquiry.”

Reddit, whose 17 billion posts and comments are seen by AI experts as valuable for training chatbots in the art of conversation, announced a deal last month to license the content to Google. Reddit and Google did not immediately respond to requests for comment. The FTC declined to comment.

AI chatbots like OpenAI’s ChatGPT and Google’s Gemini are seen as a competitive threat to Reddit, publishers, and other ad-supported, content-driven businesses. In the past year the prospect of licensing data to AI developers emerged as a potential upside of generative AI for some companies.

But the use of data harvested online to train AI models has raised a number of questions winding through boardrooms, courtrooms, and Congress. For Reddit and others whose data is generated by users, those questions include who truly owns the content and whether it’s fair to license it out without giving the creator a cut. Security researchers have found that AI models can leak personal data included in the material used to create them. And some critics have suggested the deals could make powerful companies even more dominant.

The Google deal was one of a “small number” of data licensing wins that Reddit has been pitching to investors as it seeks to drum up interest for shares being sold in its IPO. Reddit CEO Steve Huffman in the investor pitch described the company’s data as invaluable. “We expect our data advantage and intellectual property to continue to be a key element in the training of future” AI systems, he wrote.

In a blog post last month about the Reddit AI deal, Google vice president Rajan Patel said tapping the service’s data would provide valuable new information, without being specific about its uses. “Google will now have efficient and structured access to fresher information, as well as enhanced signals that will help us better understand Reddit content and display, train on, and otherwise use it in the most accurate and relevant ways,” Patel wrote.

The FTC had previously shown concern about how data gets passed around in the AI market. In January, the agency announced it was requesting information from Microsoft and its partner and ChatGPT developer OpenAI about their multibillion dollar relationship. Amazon, Google, and AI chatbot maker Anthropic were also questioned about their own partnerships, the FTC said. The agency’s Chair Lina Khan described its concern as being whether the partnerships between big companies and upstarts would lead to unfair competition.

Reddit has been licensing data to other companies for a number of years, mostly to help them understand what people are saying about them online. Researchers and software developers have used Reddit data to study online behavior and build add-ons for the platform. More recently, Reddit has contemplated selling data to help algorithmic traders looking for an edge on Wall Street.

Licensing for AI-related purposes is a newer line of business, one Reddit launched after it became clear that the conversations it hosts helped train up the AI models behind chatbots including ChatGPT and Gemini. Reddit last July introduced fees for large-scale access to user posts and comments, saying its content should not be plundered for free.

That move had the consequence of shutting down an ecosystem of free apps and add ons for reading or enhancing Reddit. Some users staged a rebellion, shutting down parts of Reddit for days. The potential for further user protests had been one of the main risks the company disclosed to potential investors ahead of its trading debut expected next Thursday—until the FTC letter arrived.

all 25 comments

sorted by: hot top controversial new old

[–] Even_Adder@lemmy.dbzer0.com 82 points 7 months ago (1 children)

I hope this helps tank that IPO.

[–] Kalkaline@leminal.space 34 points 7 months ago (1 children)

This was a company that allowed/r/jailbait to exist

[–] NorthCountryHermit@lemm.ee 10 points 7 months ago

Just a reminder that Spez can go fuck himself.

[–] gullible@fedia.io 65 points 7 months ago (1 children)

But wasn’t the API situation caused by revenue lost through the absence of advertisements in 3rd party apps? Are you trying to tell me that faultless angel spez, former moderator of /r/jailbait, misled me?

[–] JoMiran@lemmy.ml 34 points 7 months ago

[–] TheRtRevKaiser@beehaw.org 22 points 7 months ago (1 children)

Hi @hedge@beehaw.org, we're starting to ask users not to paste full articles in the description or comments, there have been some concerns about this practice and we just want to try and head it off. I have no issues at all with linking to one of the several archive sites that will allow users to bypass paywalls, though.

[–] hedge@beehaw.org 11 points 7 months ago (1 children)

Ok, will do (or I guess won't do 🙂). What are the concerns?

[–] TheRtRevKaiser@beehaw.org 24 points 7 months ago (1 children)

It's a CYA thing for copyright infringement. Linking is fine, but hosting (we think) puts us on shakier legal ground, at least from what I understand.

[–] hedge@beehaw.org 13 points 7 months ago* (last edited 7 months ago) (2 children)

Ah, I see. I certainly don't want to do anything that might threaten the hive! 🙂🐝

EDIT: Should I go back to the posts where I copied the whole article and delete the text?

[–] TheRtRevKaiser@beehaw.org 10 points 7 months ago

I don't think that's necessary, just something to keep in mind going forward.

[–] Pandantic@midwest.social 14 points 7 months ago (3 children)

Someone link me to the tool I can use to turn all my Reddit posts and comments into gibberish.

[–] ClemaX@lemm.ee 13 points 7 months ago

https://github.com/x89/Shreddit

[–] Mastengwe@lemm.ee 12 points 7 months ago* (last edited 7 months ago) (1 children)

It’s pointless. There’s a copy of everthing you say on Reddit. Even if you edit it, they have the original backed up.

[–] Pandantic@midwest.social 1 points 7 months ago (1 children)

Can I officially request removal of my account and all its content?

[–] Truck_kun@beehaw.org 7 points 7 months ago (1 children)

For compliance with EU's GDPR and CA's CPRA, they should have some tools to remove information.

Whether those are available to you will depend on where you live, and the companies policy.

[–] Pandantic@midwest.social 1 points 7 months ago (1 children)

Sadly, I live in the USA.

[–] Truck_kun@beehaw.org 6 points 7 months ago

To be clear, CA in this case is California, not Canada. But if you are on midwest.social, i'll assume you are in the Midwest. More states need to adopt some kind of similar legislation.

[–] theluddite@lemmy.ml 10 points 7 months ago

Am alternative approach that may interest you: https://theluddite.org/#!post/reddit-extension

[–] MonsiuerPatEBrown@reddthat.com 13 points 7 months ago

The FTC will fine them and then let them.

The FTC wants their cut.

[–] Alice@beehaw.org 8 points 7 months ago (1 children)

I actually didn't know about Stack Overflow. That's disappointing.

[–] GammaGames@beehaw.org 4 points 7 months ago

Yeah they’ve been trying to get ai stuff for like a year. The community hates it

[–] GrindingGears@lemmy.ca 5 points 7 months ago

*rubbing hands together" Oh this is going to be good.

That said, people are fucking stupid, especially when it comes to money.