this post was submitted on 02 Jul 2024
333 points (98.5% liked)

Privacy

32109 readers
728 users here now

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Related communities

much thanks to @gary_host_laptop for the logo design :)

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] BubbleMonkey 97 points 4 months ago (3 children)

I find this wholly unsurprising.

All ai projects should be forced to show the entirety of their training data. I don’t give a flying fuck if they want to call it proprietary, they don’t own most of the data in the first place. Even if they bought it, it doesn’t belong to them, just like we don’t own digital movies we buy.

And if even a single piece of that training data doesn’t have proper licensing for that specific use for that specific model, or they are ever found to have withheld any of the data, the model as a whole should be immediately scrapped, along with everything even tangentially derived from it, and the company should be fined fully double whatever amount of money that model generated or one years revenue for the company as a whole, whichever is more (no I don’t care if this leads to bankruptcy, should have thought about that before you stole data), and like use if for affordable housing programs or public schools or something, whatever.

They can try again with clean data, also subject to review. One time. Second time they do the same shady shit, permanently banned from the entire sector.

But regardless, we need to stop rewarding them for this behavior. And we need the consequences to actually hurt or we can expect it to get worse, not better.

[–] AbouBenAdhem@lemmy.world 25 points 4 months ago* (last edited 4 months ago)

All ai projects should be forced to show the entirety of their training data.

Agreed—but note that in this case the information was only discovered because the organizations involved (Common Crawl and LAION) do show their data. We should assume that proprietary data sets have similar issues—but this case should be seen as an opportunity to improve one of the rare open data sets, not to penalize its openness and further entrench proprietary sources.

[–] helpImTrappedOnline@lemmy.world 7 points 4 months ago

The problem with that plan is it requires actual punishment for a large corporation and that is bad for campaign funds.

[–] delirious_owl@discuss.online 1 points 4 months ago

Don't stop there. All software should be required to be open source, especially anything that is used by the government or enough of the citizens that it impacts national security