this post was submitted on 25 Jul 2024
373 points (98.7% liked)

Technology

34413 readers
897 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] brucethemoose@lemmy.world 13 points 1 month ago* (last edited 1 month ago) (3 children)

Would lemmy instances do this?

I know they can't afford to now, but hypothetically? A lot of people here don't seem to like data scraping for AI.

[–] mozz@mbin.grits.dev 38 points 1 month ago* (last edited 1 month ago) (2 children)

Your Lemmy posts are already being scraped for AI

The level of effort it would take to prevent would be infeasible to ask of even a non volunteer admin let alone a volunteer let alone literally all of them

[–] pennomi@lemmy.world 6 points 1 month ago (1 children)

Your Lemmy posts are already being scraped for AI

Good, hopefully it’ll make AI that is slightly less toxic than the rest of the internet.

It always baffles me that people don’t want their content represented in an AI - every word you write that gets indexed is a vote for how future AI will behave.

[–] theshatterstone54@feddit.uk 1 points 1 month ago (1 children)

Wait, do you actually want those companies to make even more money from your data, and want these environmentally disastrous "bullshit generators" to keep on going? I'm not saying stopping them is realistically possible, but if I had to choose, I'd greatly prefer a world without AI.

[–] pennomi@lemmy.world 7 points 1 month ago

You cannot choose a world without AI. They will get built regardless of what you want.

With that in mind, the optimal (least bad) outcome is that your world views are represented in the dataset.

[–] brucethemoose@lemmy.world 4 points 1 month ago (2 children)

That's what I figured, but I am envisioning a future where lemmy is huge and the network of admins is quite sizable.

I guess that doesn't change much?

[–] epyon22@programming.dev 17 points 1 month ago
  1. Run Lemmy instance
  2. Gain userbase
  3. Intercept data users are reading and posting from your instance and others
  4. Feed to AI
  5. Profit?

Lemmy is way less privacy oriented than reddit and that's by design.

[–] theneverfox@pawb.social 4 points 1 month ago

It's structural - you can be open or locked down, and it's hard to decentralize if you're not open

You can make it easier or harder to work with that data, but ultimately it's obsfucation - you could make it hard to parse and obscure details, but ultimately if you want decentralized federation you can't hide too much

[–] Dave@lemmy.nz 19 points 1 month ago* (last edited 1 month ago)

You don't need to scrape. If you want to get all the content on Lemmy, just set up an instance and subscribe to all the top communities, and the instances will just send you all the content.

So there isn't really a way to monetise or block it. I guess you could only federate to a whitelist, but the biggest instances will federate by default with any new instances until they are given a reason to defederate.

[–] cmnybo@discuss.tchncs.de 9 points 1 month ago (1 children)

Some Lemmy instances disallow indexing in robots.txt, however indexers can choose to ignore that and actually blocking them takes a lot more effort.

[–] brucethemoose@lemmy.world 8 points 1 month ago

Some places on a "budget" like Ao3 just rate limit hard.

I don't like that solution at all though.